Download - Brisk hadoop june2011
Brisk: Truly peertopeer Hadoop
srisatish.ambati AT gmail.com DataStax/OpenJDK @srisatish
Brisk: Hive + Hadoop + Cassandra
@srisatish
Map Reduce
@srisatish
Have large sets of data & you can work on small pieces in parallel.
@srisatish
Map Reduce@srisatish
Multicore map reduce framework, Kunle, et al
@srisatish
Parallel Execution View @srisatish
@srisatish
@srisatish
JobTrackerNameNode
HDFS
@srisatish
Writeoncereadmany!File once created, written & closed need change
@srisatish
Move computation, not data
@srisatish
@srisatish
DataNodes: Read, Write Blocks
@srisatish
NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure
Enter the Cassandra:High Scale
Peertopeer
@srisatish
When “it” does not fit in a single node!… Enter the distributed dragon!
NameNode
DataNodes
Onekindofnode!
Cassandra:High Scale
Peertopeer
@srisatish
Portfolio DemoLow latency
Live tick prices for stocks.Batch Analytics
Historical EOD prices.Value at Risk.
http://www.datastax.com/docs/0.8/brisk/brisk_demo
http://ec250194143.compute1.amazonaws.com:8888/opscenter/index.htmlhttp://ec26720212176.compute1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201105310219_0008&refresh=30http://ec250194143.compute1.amazonaws.com:8983/portfolio/
Demo URLs (good for this demo only)
Bigtable, 2006Dynamo, 2007
OSS, 2008
Incubator, 2009 TLP, 2010
A
LT
W
F
P
YKey “C”
U
Cassandra:High Scale
PeertopeerNo SPOF
@srisatish
Brisk
@srisatish
BriskHowStuffWorks version
@srisatish
YDH security edition (soon to be Apache)Apache Hive – Access via SQL like
CassandraHandlerCassandra 0.8
Use ColumnFamiliesinodesblock
@srisatish
String keyspace = “cfs”;
CfDef cf = new CfDef(); cf.setName(inodeDefaultCf); cf.setComparator_type("BytesType");…
cf.setName(sblockDefaultCf); cf.setKey_cache_size(1M); cf.setComment(
"Stores blocks of information associated with a inodeStores blocks of information associated with a inode");
cf.setKeyspace(keyspace);
@srisatish
Consistency: R + W > N
"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";
@srisatish
Hadoop: job tracker, task tracker
@srisatish
BriskSnitch: brisk nodes, cassandra nodes
@srisatish
BriskSimpleSnitch.java
if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;
logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); } @srisatish
Hive: SQLlike accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)
@srisatish
hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);
hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='20080815');
hive> SELECT count(*), ds FROM invites GROUP BY ds;
http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish
ETLRealtime
Cassandra CFsDataCenters
Scale
@srisatish
@srisatish
No me in team!
● Ben Coverston
● Ben Werther
● Brandon Williams
● Cathy Daw
● Daria Hutchinson
● Jackson Chung
● Jake Luciani
● Joaquin Casares
● Jonathan Ellis
● Michael Allen
● Mike Bulman
● Michael Weir
● Nate McCall
● Nick M Bailey
● Patricio Echague
● Tyler Hobbs
● SriSatish Ambati
● Yewei Zhang
@srisatish
@srisatish100node Brisk Cluster on Opscenter
OSS, 2008
+
+ +
Brisk
Cassandra
Incubator 2009
Bigtable, 2006Dynamo, 2007
TLP, 2010
git clone [email protected]:riptano/brisk.githttp://www.datastax.com/product/briskGetting Started via Brisk AMI.Mahalo. Thank You.
@srisatish
References● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and
Sanjay Ghemawat, http://bit.ly/googmr_pdf
● Multicore MapReduce, Kunle, et al. http://bit.ly/iRJd1n
@srisatish