stratosphere - next generation big data analytics platform from … · 2014-05-14 · the 10...

Post on 10-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stratosphere

StratosphereNext Generation Big Data Analytics Platform from Europe

Márton BalassiData Mining and Search Group1

Big Data Business Intelligence Group1

1Computer and Automation Research Institute of the Hungarian Academy of Sciences

May 11, 2014

Stratosphere

Table of Contents

Motivation

The 10 commandments for Big Data Analytics

Project info

StratosphereMotivation

Table of contents

Motivation

The 10 commandments for Big Data Analytics

Project info

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapEva Andreasson (Cloudera), 2014

I Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheap

Matthew Komorowski, 2014I Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st century

Harvard Business Review, 2012I It’s a piece of cake . . .

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .

StratosphereMotivation

The Big Data scene

The Big Data scene

What’s all the hype for?

I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . . Or is it?

StratosphereMotivation

The Big Data scene

The Big Data scene

Image courtesy of Matt Turck and Shivon Zilis

StratosphereThe 10 commandments for Big Data Analytics

Table of contents

Motivation

The 10 commandments for Big Data Analytics

Project info

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

A European project just accepted to the Apache Incubator

I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development

StratosphereThe 10 commandments for Big Data Analytics

Stratosphere in one slide

Stratosphere in one slide

Contributors

StratosphereThe 10 commandments for Big Data Analytics

1. Thou shalt use declarative programming

1. Thou shalt use declarative programming

K-Means Clustering in Stratosphere’s Scala front-end

StratosphereThe 10 commandments for Big Data Analytics

2. Thou shalt accept external (dynamic) sources

2. Thou shalt accept external (dynamic) sources

„In situ” data – no load

StratosphereThe 10 commandments for Big Data Analytics

3. Thou shalt use rich primitives

3. Thou shalt use rich primitives

Beyond MapReduce

StratosphereThe 10 commandments for Big Data Analytics

3. Thou shalt use rich primitives

3. Thou shalt use rich primitives

Beyond MapReduce

StratosphereThe 10 commandments for Big Data Analytics

4. Thou shalt deeply embed UDFs

4. Thou shalt deeply embed UDFs

Flexible and transparent

StratosphereThe 10 commandments for Big Data Analytics

5. Thou shalt optimize

5. Thou shalt optimize

Auto-parallelization and optimization as in relational databases

StratosphereThe 10 commandments for Big Data Analytics

6. Thou shalt iterate

6. Thou shalt iterate

Needed for most interesting analysis cases

StratosphereThe 10 commandments for Big Data Analytics

7. Thou shalt use a scalable and efficient execution engine

7. Thou shalt use a scalable and efficient executionengine

Reliable and robust infrastructure

StratosphereThe 10 commandments for Big Data Analytics

8. Thou shalt tackle streaming

8. Thou shalt tackle streaming

Integration of low latency jobs

StratosphereThe 10 commandments for Big Data Analytics

9. Thou shalt provide a common API through the whole framework

9. Thou shalt provide a common API through the wholeframework

Batch? BSP? Streaming? You just write the same code. . .

StratosphereThe 10 commandments for Big Data Analytics

10. Thou shalt support the lambda architecture

10. Thou shalt support the lambda architecture

Combine the reliability of batch and the speed of streaming toenable real-time queries on large datasets

First hourof input

. . .1 to 2hours

old input

Less thanan hourold input

Output

Batch1 . . . Batchn−1

Streaming1 . . . Streamingn−1 Streamingn

StratosphereProject info

Table of contents

Motivation

The 10 commandments for Big Data Analytics

Project info

StratosphereProject info

Where to look for us

Where to look for us

Project homepageThe project can be found at stratosphere.eu.The homepage served as a source for the code and most of thepictures presented on these slides.

Data Mining and Search & Big Data BI GroupsThe webpage of Budapest team members’ research groups can befound at dms.sztaki.hu and at bigdatabi.sztaki.hu.

Márton Balassimbalassi@ilab.sztaki.hu

top related