benchmarking “no one size fits all” big data analytics

30
Benchmarking “No One Size Fits All” Big Data Analytics BigFrame Team The Hong Kong Polytechnic University Duke University HP Labs

Upload: gizi

Post on 25-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Benchmarking “No One Size Fits All” Big Data Analytics. BigFrame Team The Hong Kong Polytechnic University Duke University HP Labs. Analytics System Landscape. MPP DB Greenplum, SQL server PDW, Teradata, etc. Columnar Vertica, Redshift, Vectorwise, etc. MapReduce - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Benchmarking “No One Size Fits All” Big Data Analytics

Benchmarking “No One Size Fits All”

Big Data AnalyticsBigFrame Team

The Hong Kong Polytechnic UniversityDuke University

HP Labs

Page 2: Benchmarking “No One Size Fits All” Big Data Analytics

Analytics System Landscape

• MPP DBo Greenplum, SQL server PDW, Teradata, etc.

• Columnaro Vertica, Redshift, Vectorwise, etc.

• MapReduceo Hadoop, Hive, HadoopDB, Tenzing, etc

• Streamingo Storm, Streambase, etc

• Grapho Pregel, GraphLab, etc

• Multi-tenancyo Mesos, Yarn, etc

Page 3: Benchmarking “No One Size Fits All” Big Data Analytics

Analytics System Landscape

• MPP DBo Greenplum, SQL server PDW, Teradata, etc.

• Columnaro Vertica, Redshift, Vectorwise, etc.

• MapReduceo Hadoop, Hive, HadoopDB, Tenzing, etc

• Streamingo Storm, Streambase, etc

• Grapho Pregel, GraphLab, etc

• Multi-tenancyo Mesos, Yarn, etc

What does this mean for Big Data Practitioners?

Page 4: Benchmarking “No One Size Fits All” Big Data Analytics

Gives them a lot of power!

Page 5: Benchmarking “No One Size Fits All” Big Data Analytics

Even the mighty may need a little help

Page 6: Benchmarking “No One Size Fits All” Big Data Analytics

Challenges for PractitionersWhich system touse for the app that I am developing?

• Features (e.g. graph data)

• Performance (e.g., claims like System A is 50x faster than B)

• Resource efficiency• Growth and scalability• Multi-tenancyApp Developers,

Data Scientists

Page 7: Benchmarking “No One Size Fits All” Big Data Analytics

Challenges for PractitionersWhich system touse for the app that I am developing?

Different parts of my app have different requirements

Compose "best of breed" systems Or Use "one size fits all" System?

App Developers, Data Scientists

Page 8: Benchmarking “No One Size Fits All” Big Data Analytics

Challenges for PractitionersWhich system touse for the app that I am developing?

Different parts of my app have different requirements

Managing manysystems is hard!

App Developers, Data Scientists

System Admins CIO

Total Cost of Ownership (TCO)?

Page 9: Benchmarking “No One Size Fits All” Big Data Analytics

NeedBenchmarks

Page 10: Benchmarking “No One Size Fits All” Big Data Analytics

One Approach

Categorize systems

Develop a benchmark per system category

Page 11: Benchmarking “No One Size Fits All” Big Data Analytics

Useful, But ...

• MPP DB, Columnaro TPC-H/TPC-DS, Berkeley Big Data Benchmark etc.

• MapReduceo Terasort, DFSIO, GridMix, HiBench etc.

• Streamingo Linear Road, etc.

• Grapho Graph 500, PageRank, etc.

• ...

Page 12: Benchmarking “No One Size Fits All” Big Data Analytics

Problem: May miss the Big Picture

Page 13: Benchmarking “No One Size Fits All” Big Data Analytics

Problem: May miss the Big Picture

• Cannot capture the complexities and end-to-end behavior of big data applications and deployments:o Bottleneckso Data conversion, transfer, & loading overheadso Storage costs & other parts of the data life-cycleo Resource management challengeso Total Cost of Ownership (TCO)

Page 14: Benchmarking “No One Size Fits All” Big Data Analytics

A Better Approach:

BigBench or Deep Analytics Pipeline:• Applications driven• Involved multiple types of data:

o Structuredo Semi-structuredo Unstructured

• Involved multiple types of operator:o Relation Operators: join, group byo Text Analytics: Sentiment analysiso Machine Learning

Page 15: Benchmarking “No One Size Fits All” Big Data Analytics

Problem:

Give a man fish and you will feed him for a day.

Give him fishing gear and you will feed him for life.

--Anonymous

Benchmark

X

XBenchmark Generator

Page 16: Benchmarking “No One Size Fits All” Big Data Analytics

BigFrameA Benchmark Generator for

Big Data Analytics

Page 17: Benchmarking “No One Size Fits All” Big Data Analytics

How a user uses BigFrame

HiveMapReduce

HBase

BigFrame Interface

BenchmarkGenerator

Benchmark Driver for System Under

Test

bigif(benchmark input format)

bigspec(benchmark

specification)

result

run the benchmark

System Under Test

Page 18: Benchmarking “No One Size Fits All” Big Data Analytics

bigspec: Benchmark Specification

HiveMapReduce

HBase

Page 19: Benchmarking “No One Size Fits All” Big Data Analytics

What should be captured by the benchmark input format

• The 3Vs

VolumeVelocity

Variety

Page 20: Benchmarking “No One Size Fits All” Big Data Analytics

bigif: BigFrame's InputFormat

Page 21: Benchmarking “No One Size Fits All” Big Data Analytics

Benchmark Generation

bigif(benchmark input format)

bigspec(benchmark

specification)BenchmarkGenerator

bigif describes points in a discrete space of

{Data, Query} X {Variety, Volume, Velocity}

1. Initial data to load2. Data refresh pattern3. Query streams4. Evaluation metrics

Benchmark generation can be addressed as a search problem within a rich application domain

Page 22: Benchmarking “No One Size Fits All” Big Data Analytics

Application Domain Modeled Currently

E-commerce sales,promotions,

recommendations

Social media sentiment &

influence

Benchmark generation can be addressed as a search problem within a rich application domain

Page 23: Benchmarking “No One Size Fits All” Big Data Analytics

Application Domain Modeled Currently

Page 24: Benchmarking “No One Size Fits All” Big Data Analytics

Application Domain Modeled Currently

Item

Web_sales

Promotion

Page 25: Benchmarking “No One Size Fits All” Big Data Analytics

Application Domain Modeled Currently

Page 26: Benchmarking “No One Size Fits All” Big Data Analytics

Use Case 1: Exploratory BI

• Large volumes of relational data

• Mostly aggregation and few join

• Can Spark's performance match that of a MPP DB

BigFrame will generate a benchmark specification containing

relational data and (SQL-ish) queries

Data Variety = {Relational}

Query Variety = {Micro}

Page 27: Benchmarking “No One Size Fits All” Big Data Analytics

Use Case 2: Complex BI

• Large volumes of relational data

• Even larger volumes of text data

• Combined analytics

Data Variety = {Relational, text}

Query Variety = {Macro} (application-focused instead of micro-benchmark)

BigFrame will generate a benchmark specification that includes

sentiment analysis tasks over tweets

Page 28: Benchmarking “No One Size Fits All” Big Data Analytics

Use Case 3: Dashboards

• Large volume and velocity of relational and text data

• Continuously-updated Dashboards

Data Velocity= Fast

Query Variety = continuous(as opposed to Exploratory)

BigFrame will generate a benchmark specification that includes data refresh as well as continuous queries whose results change upon data refresh

Page 29: Benchmarking “No One Size Fits All” Big Data Analytics

Working with the community

• First release of BigFrame planned for August 2013o open source with extensibility APIs

• Benchmark Driver for more systems• Utilities (accessed through the benchmark

Driver to drill down into system behavior during benchmarking)

• Instantiate the BigFrame pipeline for more app domains

Page 30: Benchmarking “No One Size Fits All” Big Data Analytics

Take Away

• Benchmarks shape a field (for better or worse); they are how we determine the value of change.

--(David Patterson, University of California Berkeley, 1994).

• Benchmarks meet different needs for different people• End customers, application developers, system

designers, system administrators, researchers, CIOs

• BigFrame helps users generate benchmarks that best meet their needs