dc area spark interactive nov 2015
TRANSCRIPT
1© Cloudera, Inc. All rights reserved.
Jean-‐Daniel Cryans on behalf of the Kudu team
Kudu: a new storage engine for fast analytics on fast data
2© Cloudera, Inc. All rights reserved.
Myself
• Software Engineer at Cloudera• On the Kudu team for 2 years• Apache HBase committer and PMC member since 2008• Previously at StumbleUpon
3© Cloudera, Inc. All rights reserved.
Motivation and GoalsWhy build Kudu?
3
4© Cloudera, Inc. All rights reserved.
Motivating Questions
• Are there user problems that can we can’t address because of gaps in Hadoopecosystem storage technologies?• Are we positioned to take advantage of advancements in the hardware landscape?
5© Cloudera, Inc. All rights reserved.
Questions to the crowd
•What is Apache Parquet good at?•What is Apache HBase good at?
6© Cloudera, Inc. All rights reserved.
Current Storage Landscape in HadoopHDFS excels at:• Efficiently scanning large amounts
of data• Accumulating data with high
throughputHBase excels at:• Efficiently finding and writing
individual rows• Making data mutable
Gaps exist when these properties are needed simultaneously
7© Cloudera, Inc. All rights reserved.
• High throughput for big scans (columnar storage and replication)Goal:Within 2x of Parquet
• Low-‐latency for short accesses (primary key indexes and quorum replication)Goal: 1ms read/write on SSD
• Database-‐like semantics (initially single-‐row ACID)
• Relational data model• SQL query• “NoSQL” style scan/insert/update (Java client)
Kudu Design Goals
8© Cloudera, Inc. All rights reserved.
Questions to the crowd
• How much RAM do typical servers have today?•What about in 2 years?• How many of you have hybrid ssd/hdd servers?•Who’s currently running/planning to run/would like to run with non-‐volatile memory express drives (NVMe)?
9© Cloudera, Inc. All rights reserved.
Changing Hardware landscape
• Spinning disk -‐> solid state storage• NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping• 3D XPoint memory (1000x faster than NAND, cheaper than RAM)
• RAM is cheaper and more abundant:• 64-‐>128-‐>256GB over last few years
• Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind.• Takeaway 2: Column stores are feasible for random access
10© Cloudera, Inc. All rights reserved.
Kudu Usage
• Table has a SQL-‐like schema• Finite number of columns (unlike HBase/Cassandra)• Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP• Some subset of columns makes up a possibly-‐composite primary key• Fast ALTER TABLE
• Java and C++ “NoSQL” style APIs• Insert(), Update(), Delete(), Scan()
• Integrations with MapReduce, Spark, and Impala•more to come!
10
11© Cloudera, Inc. All rights reserved.
Use cases and architectures
12© Cloudera, Inc. All rights reserved.
Kudu Use Cases
Kudu is best for use cases requiring a simultaneous combination ofsequential and random reads and writes
● Time Series○ Examples: Stream market data; fraud detection & prevention; risk monitoring○ Workload: Insert, updates, scans, lookups
●Machine Data Analytics○ Examples: Network threat detection○ Workload: Inserts, scans, lookups
●Online Reporting○ Examples: OperationalData Store (ODS)○ Workload: Inserts, updates, scans, lookups
13© Cloudera, Inc. All rights reserved.
Questions to the crowd
• How do you manage incoming data when running reports on data stored in Parquet?
14© Cloudera, Inc. All rights reserved.
Real-‐Time Analytics in Hadoop TodayFraud Detection in the Real World = Storage Complexity
Considerations:● How do I handle failure
during this process?
● How often do I reorganize data streaming in into a format appropriate for reporting?
● When reporting, how do I see data that has not yet been reorganized?
● How do I ensure that important jobs aren’t interrupted by maintenance?
HBaseHave we
accumulated enough data?
Incoming Data (Messaging System)
Parquet File
Reorganize HBase file into Parquet
Reporting Request
New Partition
Most Recent Partition
Historic Data
Impala on HDFS
• Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file
15© Cloudera, Inc. All rights reserved.
Real-‐Time Analytics in Hadoop with Kudu
Improvements:● One system to operate
● No cron jobs or background processes
● Handle late arrivals or data corrections with ease
● New data available immediately for analytics or operations
Historical and Real-‐timeData
Incoming Data (Messaging System)
Reporting Request
Storage in Kudu
16© Cloudera, Inc. All rights reserved.
How it works
16
17© Cloudera, Inc. All rights reserved.
Tables and Tablets
• Table is horizontally partitioned into tablets• Range or hash partitioning• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS
• Each tablet has N replicas (3 or 5), with Raft consensus• Allow read from any replica, plus leader-‐driven writes with low MTTR
• Tablet servers host tablets• Store data on local disks (no HDFS)
17
18© Cloudera, Inc. All rights reserved.
Questions to the crowd
• How many failure(s) can a quorum of 3 nodes tolerate?
19© Cloudera, Inc. All rights reserved.
20© Cloudera, Inc. All rights reserved.
Questions to the crowd
•What are the main advantages of using an LSM tree when inserting/mutating rows, like in HBase and Cassandra?• In which situations can column-‐stores can read data faster than row-‐stores?
21© Cloudera, Inc. All rights reserved.
Kudu trade-‐offs
• Random updates will be slower• HBase model allows random updates without incurring a disk seek• Kudu requires a key lookup before update, bloom lookup before insert, may incur seeks
• Single-‐row reads may be slower• Columnar design is optimized for scans• Especially slow at reading a row that has had many recent updates (e.g YCSB “zipfian”)
21
22© Cloudera, Inc. All rights reserved.
Benchmarks
22
23© Cloudera, Inc. All rights reserved.
TPC-‐H (Analytics benchmark)
• 75TS + 1 master cluster• 12 (spinning) disk each, enough RAM to fit dataset• Using Kudu 0.5.0, Impala 2.2 with Kudu support, CDH 5.4• TPC-‐H Scale Factor 100 (100GB)
• Example query:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkeyAND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;
23
24© Cloudera, Inc. All rights reserved.
-‐ Kudu outperforms Parquet by 31% (geometric mean) for RAM-‐resident data-‐ Parquet likely to outperform Kudu for HDD-‐resident (larger IO requests)
25© Cloudera, Inc. All rights reserved.
What about Apache Phoenix?• 10 node cluster (9 worker, 1 master)• HBase 1.0, Phoenix 4.3• TPC-‐H LINEITEM table only (6B rows)
25
2152
21976
131
0.04
1918
13.2
1.7
0.7
0.15
155
9.3
1.4 1.5 1.37
0.01
0.1
1
10
100
1000
10000Load TPCH Q1 COUNT(*)
COUNT(*)WHERE…
single-‐rowlookup
Time (sec)
Phoenix
Kudu
Parquet
26© Cloudera, Inc. All rights reserved.
What about NoSQL-‐style random access? (YCSB)
• YCSB 0.5.0-‐snapshot• 10 node cluster(9 worker, 1 master)• HBase 1.0• 100M rows, 10M ops
26
27© Cloudera, Inc. All rights reserved.
Apache Spark
27
28© Cloudera, Inc. All rights reserved.
Motivational Quotes
• “seeing kudu as a great fit for Spark Streaming workloads”
• “One unexpected benefit I am seeing from Kudu is the filter performance on non-‐key columns -‐ subsec on 2 million rows of meetup data... (on small ec2 instance)”• Kudu user in our Slack channel, 11/03/15
28
29© Cloudera, Inc. All rights reserved.
Spark Streaming
29
Live ingestion
Get latest data
Build models
30© Cloudera, Inc. All rights reserved.
Spark SQL
• Great fit for Dataframes• Kudu natively supports types
• Can replace Parquet storage• Kudu’s goal is to be almost as fast as Parquet• Predicate pushdown + lazy materialization• Plus makes it possible to mutate data
• A better alternative to other “NoSQL” storage engines• Aims to offer sequential consistency, 2nd only to strict consistency• Consistency can be relaxed to enable faster reading of stale data
31© Cloudera, Inc. All rights reserved.
Code snippetsval kuduContext = new KuduContext(sparkContext, kuduMaster)
val kuduRdd = kuduContext.kuduRDD(tableName, ”c1").map(r => {
val row = r._2
val c1 = row.getLong(0)etc… })
someStream.kuduForeachPartition(kuduContext, (it, kuduClient, asyncClient) => {
val table = kuduClient.openTable(tableName)
etc… })
sqlContext.load("org.kududb.spark”,
Map("kudu.table" -> ”example_table”,
"kudu.master”-> kuduMaster)).registerTempTable(“example_table")
32© Cloudera, Inc. All rights reserved.
Demo
32
33© Cloudera, Inc. All rights reserved.
Demo
33
• Code currently at https://github.com/tmalaska/SparkOnKudu/
Gamer data points
Producer sends datapoints to Kafka
Spark pulls from Kafka Spark loads basedata from Kudu
Aggregates are storedback into Kudu
Live queries comefrom Impala
34© Cloudera, Inc. All rights reserved.
Demo
35© Cloudera, Inc. All rights reserved.
Getting started
35
36© Cloudera, Inc. All rights reserved.
Project status
• Public Beta released September 28th 2015, version 0.5.0• Not ready for production• No security• Feedback/jiras/patches welcome
• Next release in this month (0.6.0):•Mac OSX support for single node deployment• Initial Apache Spark integration• Lots of small fixes and improvements
• GA sometime next year (hopefully!)
36
37© Cloudera, Inc. All rights reserved.
Getting started as a user
• http://getkudu.io• kudu-‐[email protected]• http://getkudu-‐slack.herokuapp.com/
• Quickstart VM• Easiest way to get started• Impala and Kudu in an easy-‐to-‐install VM
• CSD and Parcels• For installation on a Cloudera Manager-‐managed cluster
37
38© Cloudera, Inc. All rights reserved.
Getting started as a developer
• http://github.com/cloudera/kudu• All commits go here first
• Public gerrit: http://gerrit.cloudera.org• All code reviews happening here
• Public JIRA: http://issues.cloudera.org• Includes bugs going back to 2013. Come see our dirty laundry!
• kudu-‐[email protected]
• Apache 2.0 license open source• Contributions are welcome and encouraged!
38
39© Cloudera, Inc. All rights reserved.
http://getkudu.io/@getkudu