a walk down nosql lane in the cloud
DESCRIPTION
Introduction to NOSQL and various NOSQL solutions.TRANSCRIPT
![Page 1: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/1.jpg)
A Walk down NOSQL Lane in the Cloud
New York City Cloud Computing GroupFebruary 2011
Alexander Sicular@siculars
![Page 2: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/2.jpg)
Who is this blowhard?Columbia University pays my mortgage
For the better part of a decade in Medical Informatics
Am not shilling for any of these companies
Am not a computer scientist
Am a computer science enthusiast particularly in the area of Informatics
![Page 3: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/3.jpg)
When I put my data in the “cloud”, to me it just means that it’s
virtualized in someone else’s server room
![Page 4: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/4.jpg)
Many, many providers and only growing
Amazon, Rackspace, Joyent, CouchOne, Cloudant, Azure, GAE, Heroku, no.de
Outsourced management
Zero capex
Controlled costs
...the Silver Lining
![Page 5: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/5.jpg)
...With a Chance of Rain?
Vendor lock in
Unreliable performance
i/o
cpu, memory
Bare metal > software virtualization
![Page 6: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/6.jpg)
NoSQL or NOSQL?Not Only SQL
Non/post relational
Big tent policy
Umbrella term
Fragmented
http://www.flickr.com/photos/morgennebel/2933723145/
![Page 7: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/7.jpg)
Your Usage PatternsRead vs. Write
Mutable vs. Immutable
Product Considerations:
In place updates
Write Only Logs
![Page 8: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/8.jpg)
This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.html
Popular one page comparison of a number of NOSQL players by Kristof Kovacs:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
![Page 9: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/9.jpg)
NOSQL concepts are Not Brand New
Memcached since 2003 http://memcached.org
Google papers 2004-2006
Amazon Dynamo 2007
Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients
Using relational systems as a key-value blob store
2009 FriendFeed (not the first) http://bret.appspot.com/entry/how-friendfeed-uses-mysql
![Page 10: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/10.jpg)
Why NOSQLSupport for “Vary Large” data sets
Schemaless
Denormalized
Green field
New applications
http://www.flickr.com/photos/gailtang/1243984297/
![Page 11: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/11.jpg)
AcademiaGoogle:
Bigtable http://labs.google.com/papers/bigtable.html
GFS http://labs.google.com/papers/gfs.html
M/R http://labs.google.com/papers/mapreduce.html
Amazon:
Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
NOSQL Summer http://nosqlsummer.org/papers
![Page 12: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/12.jpg)
Under the Hood Terminology
Write Only Log http://en.wikipedia.org/wiki/Log-structured_file_system
Merkle Trees http://en.wikipedia.org/wiki/Hash_tree
B-trees http://en.wikipedia.org/wiki/B-tree
Vector clock http://en.wikipedia.org/wiki/Vector_clock
Bloom filters http://en.wikipedia.org/wiki/Bloom_filters
Big O Notation http://en.wikipedia.org/wiki/Big_o_notation
Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
![Page 13: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/13.jpg)
CAP Theoremhttp://en.wikipedia.org/wiki/CAP_theorem
Consistency
Availability
Partition Tolerance
Pick two?
http://guide.couchdb.org/draft/consistency.html
![Page 14: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/14.jpg)
CouchDBCouchOne, Cloudant
Erlang
Extreme replication scenarios
Works on phones
Updated indexing (b-tree)
HTTP interface
Offline usage
Sharded scaling
![Page 15: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/15.jpg)
CouchDB Internal Architecture
http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
![Page 16: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/16.jpg)
MongoDB10Gen, MongoHQ, MongoLab
C++
huMONGOus
Sharded scaling, replicated master/slave
Located in NYC (go visit them)
Soft landing for those coming from mysql (relational databases)
Native javascript
Secondary indexes
![Page 17: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/17.jpg)
MongoDB Sharding Diagram
http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
![Page 18: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/18.jpg)
MySQL to Mongo Query similarity
http://nosqlpedia.com/wiki/File:MongoDB.JPG
![Page 19: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/19.jpg)
RiakBasho, Joyent
Erlang
Distributed
HTTP, protobuf
Native javascript, erlang
Multiple backends
Homogeneous
CAP tunable
![Page 20: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/20.jpg)
HadoopCloudera, Apache Foundation
Java
High latency
Batch oriented
HDFS is GFS based
Open source Google stack via the Google papers
Huge ecosystem
Yahoo, FB, Twitter, Fortune 500
Pig, Hive, Flume
![Page 21: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/21.jpg)
HBaseJava
Low latency store
sits on top of Hadoop
Modeled after Google Bigtable
Column oriented
Thrift, protobuf
Backend for new Facebook Messaging service
![Page 22: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/22.jpg)
CassandraApache
Java
Column oriented
Like Bigtable and Dynamo
Originated at Facebook
At Twitter, Distributed countinghttp://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-Kinghttp://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
![Page 23: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/23.jpg)
RedisOpenRedis
C
REmote DIctionary Server
Specific data structures
incredibly fast
memcached on steroids
replicated master/slave
![Page 24: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/24.jpg)
CommonalitiesOpen Source
Adherence to common or standard:
data formats
json, bson, utf8, binary
data trandport mechanisms
http, thrift, protobuf, simple wire protocols
![Page 25: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/25.jpg)
Ok. So Now What?Analyze your requirements
Mailing lists
IRC, twitter
Project pages, wiki
Github/Google Code/Bitbucket:
project page
specific language clients
![Page 26: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/26.jpg)
Variety PackHybrid architectures will become the norm
Twitter - mysql, cassandra, hadoop
Google - mysql, GAE (BT)
Facebook - mysql, cassandra, hbase, memcached
Yahoo - mysql, hadoop
LinkedIn - voldemorthttp://www.flickr.com/photos/uncleweed/82245324/
![Page 27: A walk down NOSQL Lane in the cloud](https://reader034.vdocument.in/reader034/viewer/2022052505/5558c454d8b42a995d8b4632/html5/thumbnails/27.jpg)
Questions?
New York City Cloud Computing GroupFebruary 2011
Alexander Sicular@siculars