nosql for great good [hanoi.rb talk]

40
NoSQL database for great good @huydx hanoi.rb

Upload: huy-do

Post on 14-Aug-2015

434 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: NoSQL for great good [hanoi.rb talk]

NoSQL database for great good

@huydx hanoi.rb

Page 2: NoSQL for great good [hanoi.rb talk]

$> whoami

huy Software developer Tokyo base ruby/scala user

nickname: @huydx

Page 3: NoSQL for great good [hanoi.rb talk]

Disclaimer

This talk is not going to go detail about any NoSQL

I'm going to talk about: when we need to choose a nosql db, how should we think?

Page 4: NoSQL for great good [hanoi.rb talk]

What people often think about NoSQL?

Page 5: NoSQL for great good [hanoi.rb talk]

• As cache

• As a magic which can make "any" web system faster

Page 6: NoSQL for great good [hanoi.rb talk]

Your system is slow

Just use NoSQLRDBMS is shit

Page 7: NoSQL for great good [hanoi.rb talk]

RDBMS is not slow NoSQL is not the cure for everything

Page 8: NoSQL for great good [hanoi.rb talk]

RDBMS is awesome

• Can scan 7m rows / sec with index! • Can handle very big data (facebook) • Has very flexible query language (SQL) • Has some awesome analytics feature

(window function-postgresql)

• Has ACID properties

https://www.percona.com/blog/2008/04/09/how-fast-can-mysql-process-data/

Page 9: NoSQL for great good [hanoi.rb talk]

Why ACID is important• Atomicity : protect transaction (all or nothing) • Consistency: protect data correctness • Isolation : protect data from concurrency • Durability : protect data from failure

ACID makes a database something you can

TRUST

Page 10: NoSQL for great good [hanoi.rb talk]

So• RMDBS is way better than you thought

• You should learn to do RMDBS the right way

• How to make the best performance from RMDBS (index tuning, query optimize, data modeling, master-slave replication, monitoring, shard-ing the right way....)

https://www.percona.com/blog/2014/03/27/a-conversation-with-5-facebook-mysql-gurus/

Page 11: NoSQL for great good [hanoi.rb talk]

But this talk is about NoSQL!!!!!!

Page 12: NoSQL for great good [hanoi.rb talk]

Where RMDBS is not fit for?• Nature of data: When data is not row-column

style (multidimensional data) • How your data scale : Data shard-ing (You

don't want to shard-ing) • ACID is great, but it degrade performance • Single point of failure : single master • Data usage : when you need realtime, fast data

https://www.percona.com/blog/2009/08/06/why-you-dont-want-to-shard/

Page 13: NoSQL for great good [hanoi.rb talk]

Let's talk about NoSQL

Page 14: NoSQL for great good [hanoi.rb talk]

We have plenty of playersBut when, and how to use them?

Page 15: NoSQL for great good [hanoi.rb talk]

We have a decent answer: It depends!!

Page 16: NoSQL for great good [hanoi.rb talk]

What do you want to store?

• Geo-partial data? • Users important data? (password, paying

information..) • Cache data? • Analytics realtime data (write/read intensive)

Page 17: NoSQL for great good [hanoi.rb talk]

Where do you want to store?

• On memory? • On disk?

• On Slow Disk (HDD) • On Fast Disk (SSD)

Page 18: NoSQL for great good [hanoi.rb talk]

How big is your data

• Able to fit into memory?

• Able to fit into single machine?

• Not able to fit into hundreds of machine?

Page 19: NoSQL for great good [hanoi.rb talk]

It's there any factor to category NoSQL database

Page 20: NoSQL for great good [hanoi.rb talk]
Page 21: NoSQL for great good [hanoi.rb talk]

Data Model

Query Model

+

Page 22: NoSQL for great good [hanoi.rb talk]

NoSQL categorized by how data model

Documentpair each key with a complex data structure known as a document (JSON, BSON).

MongoDB, CouchDB, RethinkDB

Column Family

One row key pair with many column (rows in RDBMS) (easy for block partition)

Cassandra, HBase, Hypertable

Graph Store data as nodes + link between nodes

Neo4j, FlockDB

KVSJust a key + a value (a value can be complex, but will not be able to as wide as column family)

Riak, Memcached, Redis, CouchBase

Page 23: NoSQL for great good [hanoi.rb talk]

What about merit / demerit

of each data model?

Page 24: NoSQL for great good [hanoi.rb talk]

Data model affect how we query data

User always want query method to be as flexible as possible

But sometimes, we have to face the trade-off between

flexibility and scalability

Page 25: NoSQL for great good [hanoi.rb talk]

• Document : query can be very flexible because document is examinable (mongodb has very rich query language). Data model can change very flexible

• Column Family : just a key value with a very wide fields, which make it very fast to look up a bunch of values

• Graph : for very special cases when you need to store and query relationship (followers in twitters)

• KVS : when you really need high performance, and just need to look up for simple value

Page 26: NoSQL for great good [hanoi.rb talk]

So it really depends, right?

Page 27: NoSQL for great good [hanoi.rb talk]

Data model for NoSQL is hard!

So be careful with your selection

Page 28: NoSQL for great good [hanoi.rb talk]

Sometimes the borderline of data modeling is blurred

We need other factor to consider

Page 29: NoSQL for great good [hanoi.rb talk]

Scalability

Page 30: NoSQL for great good [hanoi.rb talk]

First we need to know about CAP theorem

Page 31: NoSQL for great good [hanoi.rb talk]

http://webpages.cs.luc.edu/~pld/353/gilbert_lynch_brewer_proof.pdf

Page 32: NoSQL for great good [hanoi.rb talk]

We can only have two of them!!!!!!!!

NO MORE!!!!

Page 33: NoSQL for great good [hanoi.rb talk]

http://blog.flux7.com/blogs/nosql/cap-theorem-why-does-it-matter

Page 34: NoSQL for great good [hanoi.rb talk]

Just ask your self: what do you care about

• You need very fast write and read, data can be a little bit stale -> A + P

• You need transaction, and every one must see the same view, but sometimes something must be lock -> C + P

• You don't need a distributed system which is false-tolerance with network problem -> C + A

Page 35: NoSQL for great good [hanoi.rb talk]

So we have two options to think about, what's more?

Page 36: NoSQL for great good [hanoi.rb talk]

Operation

Page 37: NoSQL for great good [hanoi.rb talk]

Programmer may not care but

Infrastructure engineer care

Page 38: NoSQL for great good [hanoi.rb talk]

What factors affect operation?

• What is your database distributed model, how they shard, and replicate (master-slave or p2p)

• Do your database run on JVM? (operating a JVM system is waaaayyy bothersome than a system written in C or C++)

• Do your database has single point of failure? • Do your database optimized for SSD only?

Page 39: NoSQL for great good [hanoi.rb talk]

Operation is hard

When you fail at operation, you lost your data

So choose what you know very well about

Page 40: NoSQL for great good [hanoi.rb talk]

Conclusion

• It's really depends!!!!

• Ask your self: Is it really needed to use nosql?

• First know your requirement, know your data

• Investigate carefully before choosing any solution (when you fail to choose, you lost your data)