dc area spark interactive nov 2015

1© Cloudera, Inc. All rights reserved.

Jean-‐Daniel Cryans on behalf of the Kudu team

Kudu: a new storage engine for fast analytics on fast data


Myself

• Software Engineer at Cloudera• On the Kudu team for 2 years• Apache HBase committer and PMC member since 2008• Previously at StumbleUpon


Motivation and GoalsWhy build Kudu?

3


Motivating Questions

• Are there user problems that can we can’t address because of gaps in Hadoopecosystem storage technologies?• Are we positioned to take advantage of advancements in the hardware landscape?


Questions to the crowd

•What is Apache Parquet good at?•What is Apache HBase good at?


Current Storage Landscape in HadoopHDFS excels at:• Efficiently scanning large amounts

of data• Accumulating data with high

throughputHBase excels at:• Efficiently finding and writing

individual rows• Making data mutable

Gaps exist when these properties are needed simultaneously


• High throughput for big scans (columnar storage and replication)Goal:Within 2x of Parquet

• Low-‐latency for short accesses (primary key indexes and quorum replication)Goal: 1ms read/write on SSD

• Database-‐like semantics (initially single-‐row ACID)

• Relational data model• SQL query• “NoSQL” style scan/insert/update (Java client)

Kudu Design Goals



• How much RAM do typical servers have today?•What about in 2 years?• How many of you have hybrid ssd/hdd servers?•Who’s currently running/planning to run/would like to run with non-‐volatile memory express drives (NVMe)?


Changing Hardware landscape

• Spinning disk -‐> solid state storage• NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping• 3D XPoint memory (1000x faster than NAND, cheaper than RAM)

• RAM is cheaper and more abundant:• 64-‐>128-‐>256GB over last few years

• Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind.• Takeaway 2: Column stores are feasible for random access


Kudu Usage

• Table has a SQL-‐like schema• Finite number of columns (unlike HBase/Cassandra)• Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP• Some subset of columns makes up a possibly-‐composite primary key• Fast ALTER TABLE

• Java and C++ “NoSQL” style APIs• Insert(), Update(), Delete(), Scan()

• Integrations with MapReduce, Spark, and Impala•more to come!

10


Use cases and architectures


Kudu Use Cases

Kudu is best for use cases requiring a simultaneous combination ofsequential and random reads and writes

● Time Series○ Examples: Stream market data; fraud detection & prevention; risk monitoring○ Workload: Insert, updates, scans, lookups

●Machine Data Analytics○ Examples: Network threat detection○ Workload: Inserts, scans, lookups

●Online Reporting○ Examples: OperationalData Store (ODS)○ Workload: Inserts, updates, scans, lookups



• How do you manage incoming data when running reports on data stored in Parquet?


Real-‐Time Analytics in Hadoop TodayFraud Detection in the Real World = Storage Complexity

Considerations:● How do I handle failure

during this process?

● How often do I reorganize data streaming in into a format appropriate for reporting?

● When reporting, how do I see data that has not yet been reorganized?

● How do I ensure that important jobs aren’t interrupted by maintenance?

HBaseHave we

accumulated enough data?

Incoming Data (Messaging System)

Parquet File

Reorganize HBase file into Parquet

Reporting Request

New Partition

Most Recent Partition

Historic Data

Impala on HDFS

• Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file


Real-‐Time Analytics in Hadoop with Kudu

Improvements:● One system to operate

● No cron jobs or background processes

● Handle late arrivals or data corrections with ease

● New data available immediately for analytics or operations

Historical and Real-‐timeData

Incoming Data (Messaging System)

Reporting Request

Storage in Kudu


How it works

16


Tables and Tablets

• Table is horizontally partitioned into tablets• Range or hash partitioning• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS

• Each tablet has N replicas (3 or 5), with Raft consensus• Allow read from any replica, plus leader-‐driven writes with low MTTR

• Tablet servers host tablets• Store data on local disks (no HDFS)

17



• How many failure(s) can a quorum of 3 nodes tolerate?



•What are the main advantages of using an LSM tree when inserting/mutating rows, like in HBase and Cassandra?• In which situations can column-‐stores can read data faster than row-‐stores?


Kudu trade-‐offs

• Random updates will be slower• HBase model allows random updates without incurring a disk seek• Kudu requires a key lookup before update, bloom lookup before insert, may incur seeks

• Single-‐row reads may be slower• Columnar design is optimized for scans• Especially slow at reading a row that has had many recent updates (e.g YCSB “zipfian”)

21


Benchmarks

22


TPC-‐H (Analytics benchmark)

• 75TS + 1 master cluster• 12 (spinning) disk each, enough RAM to fit dataset• Using Kudu 0.5.0, Impala 2.2 with Kudu support, CDH 5.4• TPC-‐H Scale Factor 100 (100GB)

• Example query:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkeyAND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;

23


-‐ Kudu outperforms Parquet by 31% (geometric mean) for RAM-‐resident data-‐ Parquet likely to outperform Kudu for HDD-‐resident (larger IO requests)


What about Apache Phoenix?• 10 node cluster (9 worker, 1 master)• HBase 1.0, Phoenix 4.3• TPC-‐H LINEITEM table only (6B rows)

25

2152

21976

131

0.04

1918

13.2

1.7

0.7

0.15

155

9.3

1.4 1.5 1.37

0.01

0.1

1

10

100

1000

10000Load TPCH Q1 COUNT(*)

COUNT(*)WHERE…

single-‐rowlookup

Time (sec)

Phoenix

Kudu

Parquet


What about NoSQL-‐style random access? (YCSB)

• YCSB 0.5.0-‐snapshot• 10 node cluster(9 worker, 1 master)• HBase 1.0• 100M rows, 10M ops

26


Apache Spark

27


Motivational Quotes

• “seeing kudu as a great fit for Spark Streaming workloads”

• “One unexpected benefit I am seeing from Kudu is the filter performance on non-‐key columns -‐ subsec on 2 million rows of meetup data... (on small ec2 instance)”• Kudu user in our Slack channel, 11/03/15

28


Spark Streaming

29

Live ingestion

Get latest data

Build models


Spark SQL

• Great fit for Dataframes• Kudu natively supports types

• Can replace Parquet storage• Kudu’s goal is to be almost as fast as Parquet• Predicate pushdown + lazy materialization• Plus makes it possible to mutate data

• A better alternative to other “NoSQL” storage engines• Aims to offer sequential consistency, 2nd only to strict consistency• Consistency can be relaxed to enable faster reading of stale data


Code snippetsval kuduContext = new KuduContext(sparkContext, kuduMaster)

val kuduRdd = kuduContext.kuduRDD(tableName, ”c1").map(r => {

val row = r._2

val c1 = row.getLong(0)etc… })

someStream.kuduForeachPartition(kuduContext, (it, kuduClient, asyncClient) => {

val table = kuduClient.openTable(tableName)

etc… })

sqlContext.load("org.kududb.spark”,

Map("kudu.table" -> ”example_table”,

"kudu.master”-> kuduMaster)).registerTempTable(“example_table")


Demo

32


Demo

33

• Code currently at https://github.com/tmalaska/SparkOnKudu/

Gamer data points

Producer sends datapoints to Kafka

Spark pulls from Kafka Spark loads basedata from Kudu

Aggregates are storedback into Kudu

Live queries comefrom Impala


Demo


Getting started

35


Project status

• Public Beta released September 28th 2015, version 0.5.0• Not ready for production• No security• Feedback/jiras/patches welcome

• Next release in this month (0.6.0):•Mac OSX support for single node deployment• Initial Apache Spark integration• Lots of small fixes and improvements

• GA sometime next year (hopefully!)

36


Getting started as a user

• http://getkudu.io• kudu-‐[email protected]• http://getkudu-‐slack.herokuapp.com/

• Quickstart VM• Easiest way to get started• Impala and Kudu in an easy-‐to-‐install VM

• CSD and Parcels• For installation on a Cloudera Manager-‐managed cluster

37


Getting started as a developer

• http://github.com/cloudera/kudu• All commits go here first

• Public gerrit: http://gerrit.cloudera.org• All code reviews happening here

• Public JIRA: http://issues.cloudera.org• Includes bugs going back to 2013. Come see our dirty laundry!

• kudu-‐[email protected]

• Apache 2.0 license open source• Contributions are welcome and encouraged!

38


http://getkudu.io/@getkudu

dc area spark interactive nov 2015

Documents