Download - Treasure Data: Big Data Analytics on Heroku
Treasure Data:Big Data Analytics on HerokuMuga Nishizawa, Chief Software Architect
Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data
3
Treasure Data Overview Founded to deliver big data analytics in days not months without
specialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team
• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc.
Treasure Data is in production• 20 customers incl. Fortune 500 companies• 100+ billion records stored
Processing 10,000 messages per second
4
Our Customers – Fortune Global 500 leaders and start-ups including:
5
One Hundred Billion Records and Growing!
120
100
80
60
40
20Sep2011
Nov2011
Jan2012
Mar2012
May2012
Jul2012
Aug2012
6
Treasure Data Service“Store Your Data Now for Future Insights”
7
Treasure Data Service
UserApache
App
App
Other data sources
RDBMSTreasure Data
columnar data storage
QueryProcessingCluster
Query API
HIVE, PIG (to be supported)
JDBC, REST
MAPREDUCE JOBS
User
td-command
BI apps
“Store Your Data Now for Future Insights”
User
8
Apache
App
App
Other data sources
RDBMSTreasure Data
columnar data storage
QueryProcessingCluster
Query API
HIVE, PIG (to be supported)
JDBC, REST
MAPREDUCE JOBS
User
td-command
BI apps
2012-02-04 01:33:51myappdb.buylog { “user”: ”12345”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}
Treasure Data Service“Store Your Data Now for Future Insights”
User
9
Apache
App
App
Other data sources
RDBMSTreasure Data
columnar data storage
QueryProcessingCluster
Query API
HIVE, PIG (to be supported)
JDBC, REST
MAPREDUCE JOBS
User
td-command
BI apps
$ td query -w -d myappdb \ "SELECT \ TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") AS day, \ COUNT(1) AS cnt \ FROM buylog \ GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") \ ORDER BY cnt"
Treasure Data Service“Store Your Data Now for Future Insights”
Apache
10
App
App
Other data sources
RDBMSTreasure Data
columnar data storage
QueryProcessingCluster
Query API
HIVE, PIG (to be supported)
JDBC, REST
MAPREDUCE JOBS
User
td-command
BI apps
+------------+------+| day | cnt |+------------+------+| 2012-05-26 | 4981 || 2012-05-27 | 4481 || 2012-05-28 | 481 |+------------+------+
User
Treasure Data Service“Store Your Data Now for Future Insights”
11
Comparing On-Premise & Cloud Big Data Mkts
On-Premise
Cloud
Data Volume
Database-as-a-
Service
Big Data-as-a-Service
Low High
Data Warehouse
Traditional DBMS
(ODS, Data Mart) Hadoop
© 2012 Forrester Research, Inc. Reproduction Prohibited
Treasure Data as Heroku Add-on
12
Demo with Heroku
13
Synergy Effect for Data-Driven Development!
10
14
×
The Power of the Cloud
Easier to ScaleEasier to MaintainEasier to Iterate
11
15
Implementation ProcessTraditional DW and On-Premise Big Data
16
Implementation ProcessTraditional DW and On-Premise Big Data
Dramatically streamlinedImplementation process
17
Heroku×
Treasure Data
Viki.com: “Global Hulu”
14
18
Viki Before
Hard to manage Hadoop Complicated data collection
19
Viki After
No more Hadoop maintenance Versatile data collector, td-agent
20
Please Try It!
21
How Does It Work?
22
Query ProcessingQuery Language
Query Execution
Columnar Data
Object Storage
23
1/4: Compile SQL into MapReduce
SELECT COUNT(DISTINCT ip) FROM tbl;
24
2/4: MapReduce is executed in parallel
cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)
SELECT COUNT(DISTINCT ip) FROM tbl;
25
3/4: Columnar Data Access
10Gbps Network
Read ONLY the Required Part of Data
SELECT COUNT(DISTINCT ip) FROM tbl;
26
4/4: Object-based Storage
27
Enjoy Data-Driven Development!
28
Big Data for the Rest of Us
www.treasure-data.com | @TreasureData
32
Great Investors Bill Tai Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO Dave Stamm – Clarify, Daisy Systems, Enkata Othman Laraki –Twitter James Lindembaum, Adam Wiggins and Orion Henry – Heroku Anand Babu Periasamy and Hitesh Chellani –Gluster Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku Dan Schienman – Former Cisco SVP Jean-Philippe Emelie Marcos – Tango, D.E. Shaw + executives from Cisco, Red Hat, Salesforce.com, GREE
33
What are your options? Traditional
Too much complexity Too long to get live Too expensive to maintain Can only innovate at speed of
vendor
OnPremise Hadoop• Never design for analytic
processing• Too many people• Too much software from too
many sources
Cloud Hadoop• Partial solution• Vendor lock-in
34Confidential
35
Example Use Case – MySQL to TD
36
Example Use Case – MySQL to TD