java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
DESCRIPTION
Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing. (And it seemed enjoyable to the audience in attendance)TRANSCRIPT
![Page 1: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/1.jpg)
Brisk: Truly peer-to-peer Hadoop High-order bits from Cassandra & Hadoop
srisatish ambati@srisatish
![Page 2: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/2.jpg)
How many in audience…
![Page 3: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/3.jpg)
NoSQL -Know your queries.
![Page 4: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/4.jpg)
points
• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency
– Why facebook is not using Cassandra?• Anti-patterns• Community, Code, Tools• Q&A
![Page 5: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/5.jpg)
Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy
![Page 6: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/6.jpg)
TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
Metrics typically way larger dataset than users.
![Page 7: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/7.jpg)
Why Cassandra?
![Page 8: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/8.jpg)
Operational simplicity peer-to-peer
![Page 9: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/9.jpg)
Operational simplicity peer-to-peer
write
read
![Page 10: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/10.jpg)
Replication: Multi-datacenterMulti-region ec2Multi-availability zones
![Page 11: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/11.jpg)
Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones
dc1 dc2
reads local
![Page 12: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/12.jpg)
“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
4.21.2011, Amazon Web Services outage:
![Page 13: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/13.jpg)
Netflix was running on AWS.
4.21.2011, Amazon Web Services outage:
![Page 14: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/14.jpg)
fast durable writes. fast reads.
![Page 15: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/15.jpg)
Writes Sequential, append-only.~1-5ms
![Page 16: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/16.jpg)
Writes Sequential, append-only.~1-5ms
On cloud: ephemeral disks rock!
![Page 17: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/17.jpg)
Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized
![Page 18: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/18.jpg)
Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized
ssds: improved read performance!
![Page 19: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/19.jpg)
amortize Replication over writes Repair over reads
![Page 20: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/20.jpg)
Distribution between nodes Gossip Anti-entropy Failure-detector
L i g h t w e i g h t
![Page 21: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/21.jpg)
Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)
![Page 22: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/22.jpg)
Usecase #3: h a d o o pHdfs cassandra hiveLogs stats analytics
![Page 23: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/23.jpg)
BriskTruly peer-to-peer hadoop.
![Page 24: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/24.jpg)
mv computationnot data
![Page 25: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/25.jpg)
![Page 26: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/26.jpg)
word count in MapReduce
map(String key, String value): // key: document name
// value: document contents for each word w in value: EmitIntermediate(w, "1");
reduce(String key, Iterator values): // key: a word
// values: a list of counts int result = 0; for each v in values: result += ParseInt(v);
Emit(AsString(result));
![Page 27: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/27.jpg)
Parallel Execution View
![Page 28: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/28.jpg)
![Page 29: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/29.jpg)
![Page 30: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/30.jpg)
immutable datawrite-once-read-many!Files once created, written & closed..
not changing!
![Page 31: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/31.jpg)
jobtracker, tasktrackerhdfs: namenode, datanode
![Page 32: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/32.jpg)
clouderaamazon: elastic map reducehortonworksmapRbrisk
![Page 33: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/33.jpg)
Tools & Analytics Hive, Pig, RKarmasphereDatameer… dozens of stealth startups!
![Page 34: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/34.jpg)
![Page 35: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/35.jpg)
“However, given that there is only a single master, it’s failure is unlikely;”The MapReduce paper, 2004. Sanjay et,al, Google.
![Page 36: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/36.jpg)
Namenode decomposition, explained.
![Page 37: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/37.jpg)
![Page 38: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/38.jpg)
NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure
![Page 39: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/39.jpg)
![Page 40: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/40.jpg)
Use column families (tables)inodesblock
![Page 41: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/41.jpg)
One kind of nodeno master node, no spofpeer-to-peer
![Page 42: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/42.jpg)
![Page 43: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/43.jpg)
near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes
![Page 44: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/44.jpg)
BriskSimpleSnitch.java
if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;
logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); }
![Page 45: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/45.jpg)
Hive: SQL-like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)
![Page 46: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/46.jpg)
hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);
hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
hive> SELECT count(*), ds FROM invites GROUP BY ds;
http://www.datastax.com/docs/0.8/brisk/about_hive
![Page 47: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/47.jpg)
ETLReal-time
Cassandra CFsDataCenters
Scale
@srisatish
![Page 48: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/48.jpg)
@srisatish
![Page 49: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/49.jpg)
No me in team!
Ben Coverston Ben Werther Brandon Williams Cathy Daw Jackson Chung Jake Luciani Joaquin Casares Jonathan Ellis
Michael Allen Mike Bulman Nate McCall Nick M Bailey Patricio Echague Tyler Hobbs SriSatish Ambati Yewei Zhang
![Page 50: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/50.jpg)
@srisatish100-node Brisk Cluster on Opscenter
![Page 51: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/51.jpg)
FUD, acronym: fear, uncertainty, doubt.
![Page 52: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/52.jpg)
Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS
* N is replication factor. Not to be confused with T=total #of nodes
![Page 53: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/53.jpg)
Tune-able, flexibility.For High Consistency:
read:quorum, write:quorumFor High Availability:
high W, low R.
![Page 54: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/54.jpg)
Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";
* N is replication factor. Not to be confused with T=total #of nodes
![Page 55: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/55.jpg)
![Page 56: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/56.jpg)
Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.
Average NoSQL deployment size: ~6-12 nodes.
![Page 57: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/57.jpg)
Usecase #5: searchApache Solr + Cassandra = Solandra
Other inbox/file Searches:xobni, c3
github.com/tjake/solandra
![Page 58: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/58.jpg)
“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.
![Page 59: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/59.jpg)
Miscellaneous, Myth: data-loss, partial rows.writes are durable.
![Page 60: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/60.jpg)
Anti-PatternsTransactionsJoinsRead before write
![Page 61: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/61.jpg)
Anti-Patterns for cloudebsjvm, virtualizedsingle region
![Page 62: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/62.jpg)
A few more good reasons for Cassandra...
![Page 63: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/63.jpg)
ToolsAMIs, OpsCenter, DataStaxAppDynamics
Getting Started with brisk ami
Netflix just builds AMIs for deployment!
![Page 64: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/64.jpg)
B e a u t i f u l C 0 d e
= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.
![Page 65: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/65.jpg)
Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.
compaction.
![Page 66: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/66.jpg)
CommunityRobust. Rapid. Brisk #Professional support from DataStax.git clone [email protected]:riptano/brisk.git
engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
Come join the efforts!
![Page 67: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/67.jpg)
Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra
![Page 68: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/68.jpg)
![Page 69: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/69.jpg)
![Page 70: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/70.jpg)
Copyright: xkcd
![Page 71: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/71.jpg)
Copyright: plantoys
… more than one way to do it!
![Page 72: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/72.jpg)
Summary -high scale peer-to-peer datastore
best friend for multi-region, multi-zone availability.
Hadoop – HDFS engulfing the DataWorld
Brisk – best of both worlds!
![Page 73: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/73.jpg)
Q&A@srisatish
![Page 74: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/74.jpg)
OSS, 2008
+
+ +
Brisk
Cassandra
Incubator 2009
Bigtable, 2006Dynamo, 2007
TLP, 2010
![Page 75: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop](https://reader036.vdocument.in/reader036/viewer/2022062616/54973a2bb47959a9088b4585/html5/thumbnails/75.jpg)
NoSQL -Know your queries.