introduction to big data and the lambda...

24
2013 © Trivadis BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 2013 © Trivadis Introduction to Big Data and the Lambda Architecture Marc Schöni Meinrad Weiss April 2014 04.03.2014 Big Data and the Lambda Architecture R 1.00 1

Upload: others

Post on 25-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

2013 © Trivadis

BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

2013 © Trivadis

Introduction to Big Data

and the Lambda ArchitectureMarc Schöni

Meinrad Weiss

April 2014

04.03.2014Big Data and the Lambda Architecture R 1.001

2013 © Trivadis

04.03.2014Big Data and the Lambda Architecture R 1.00

What is Big Data, why do we care?

2

2013 © Trivadis

The world of data has changed

By 2015, organizations integrating high-value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20%.

– Gartner, Regina Casonato et al., “Information Management in the 21st Century”

Consumerization of IT

10xincrease every five years

85%from new data types

Dataexplosion

4.3connected devices per adult

27%using social media input

04.03.2014Big Data and the Lambda Architecture R 1.00

3

2013 © Trivadis

Big Data

04.03.2014Big Data and the Lambda Architecture R 1.00

4

2013 © Trivadis

Team

BI

Corporate

BI

Big

Data

SQL SERVER 2012 Analysis Services

DAX SSRS Reporting Services TabularColumn Store Index Sharepoint Partitioning

VB Skript MDX SSIS SQL Server Data Tools

Integration Services TSQL UDM PowerPivot Maps

PowerView Office Excel Access SelfService BI PerformancePoint

PDW PolybaseSQLHive

HDInsight Fasttrack Appliances

ODBC OLE DB C# Web Services

AzureCloud xVelocity BISM SSAS

Personal

04.03.2014Big Data and the Lambda Architecture R 1.00

5

2013 © Trivadis

Big data solutions deal with complexities of:

VOLUME

(Size)

VARIETY

(Structure)

VELOCITY

(Speed)

Big Data

VALUE

Hadoop/HDInsight04.03.2014Big Data and the Lambda Architecture R 1.00

6

2013 © Trivadis

Data Complexity: Variety and Velocity

Terabytes

Gigabytes

Megabytes

Petabytes

Big Data Patterns

Hadoop/HDInsight04.03.2014Big Data and the Lambda Architecture R 1.00

7

2013 © Trivadis

Big Data and the Lambda Architecture R 1.00

Data sources Non-Relational Data

The modern data warehouse

04.03.2014

8

2013 © Trivadis

Data Warehouse

ETL

Traditional DW/BI Environment

2013 © Trivadis

Business Critical

Data Warehouse

ETL

Sensor Data

Log Data

Automated

Data

Social

Networks

RFID Data

HDInsight

Sensor Data

Log Data

Automated

Data

Social

Networks

RFID Data

Tomorrows DW/BI Environment

04.03.2014Big Data and the Lambda Architecture R 1.00

10

2013 © Trivadis

HBase (column DB)

Hive Mahout

Oozie

Sqoop

HBase/Cassandra/Couch/

MongoDB

Avro

Zo

okeep

er

Pig FlumeCascadingR

Am

bari

HCatalog

Hadoop = MapReduce + HDFS

Distributed, scalable system on commodity hardware composed of:

HDFS—distributed file system

MapReduce—programming model

Others: HBase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper

What is Hadoop?

04.03.2014Big Data and the Lambda Architecture R 1.00

11

2013 © Trivadis

Machine

Learning

Graph

Processing

Distributed

Compute

Extract Load

Transform

Predictive

Analysis

Hadoop capabilities

04.03.2014Big Data and the Lambda Architecture R 1.00

12

2013 © Trivadis

A replacement for

Data Warehouse

A place to learn how to

code

C#A place for low latency

data

Hadoop is not…

04.03.2014Big Data and the Lambda Architecture R 1.00

13

2013 © Trivadis

Move HDFS into the warehouse before analysis

ETL

Hadoop ecosystem

Learn new

skills

SQL

Build

Integrate

Manage

Maintain

Support

Limitations: Analysis with Big Data todaySteep learning curve, slow and inefficient

04.03.2014Big Data and the Lambda Architecture R 1.00

14

2013 © Trivadis

Microsoft Business Intelligence (BI) • Hive ODBC Connectivity

• BI Tools for Big Data

Better on Windows and Azure • Active Directory

• System Center

• .Net Programmability

Microsoft Data Connectivity• SQL Server /

SQL Parallel Data Warehouse

• Azure Storage /

Azure Data Market

Collaborate with and Contribute to OSS• Collaborate with HortonWorks

• Provide improvements

and Windows support back to OSS

Microsoft Hadoop Vision

Hortonworks Founder and Architect Arun Murthy

"Microsoft is far ahead of everyone

else in terms of what they're

contributing back to the community"

04.03.2014Big Data and the Lambda Architecture R 1.00

15

2013 © Trivadis

04.03.2014Big Data and the Lambda Architecture R 1.00

Big Data Lambda Architecture

16

2013 © Trivadis

Big Data Lambda Architecture

• Batch layer• Stores master dataset

• Compute arbitrary views

• Speed layer• Fast, incremental algorithms

• Batch layer eventually overrides speed layer

• Serving layer• Random access to batch views

• Updated by batch layer

04.03.2014Big Data and the Lambda Architecture R 1.00

17

2013 © Trivadis

The Batch Layer

• Stores master dataset (in append mode)

• Unrestrained computation

• Horizontally scalable

• High latency

04.03.2014Big Data and the Lambda Architecture R 1.00

18

2013 © Trivadis

The Speed Layer

• Stream processing of data

• Stores a limited window of data

• Dynamic computation

04.03.2014Big Data and the Lambda Architecture R 1.00

19

2013 © Trivadis

The Serving Layer

• Queries the batch and real-time views

• Merges the results

04.03.2014Big Data and the Lambda Architecture R 1.00

20

2013 © Trivadis

Microsoft Lambda Architecture Support

04.03.2014Big Data and the Lambda Architecture R 1.00

21

2013 © Trivadis

Extremely large volume of unstructured web logs

Ad hoc analysis of logs to prototype patterns

Hadoop data cluster feeds large 24TB cube

Business users analyze cube data

E.g. STRUCTURED & UNSTRUCTURED DATA

04.03.2014Big Data and the Lambda Architecture R 1.00

22

2013 © Trivadis

Apache Hadoop SQL Server Analysis Service (SSAS)

Microsoft Excel and PowerPivot

Other BI Tools and Custom Applications

Hadoop Data

Third Party Database

SQL Server

Analysis Services (SSAS Cube)

+

Custom

Applications

SQL Server Connector (Hadoop Hive ODBC)

Staging Database

04.03.2014Big Data and the Lambda Architecture R 1.00

23

2013 © Trivadis

Big Data and the Lambda Architecture R 1.0025

Questions?