introduction to big data and the lambda...
TRANSCRIPT
2013 © Trivadis
BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
2013 © Trivadis
Introduction to Big Data
and the Lambda ArchitectureMarc Schöni
Meinrad Weiss
April 2014
04.03.2014Big Data and the Lambda Architecture R 1.001
2013 © Trivadis
04.03.2014Big Data and the Lambda Architecture R 1.00
What is Big Data, why do we care?
2
2013 © Trivadis
The world of data has changed
By 2015, organizations integrating high-value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20%.
– Gartner, Regina Casonato et al., “Information Management in the 21st Century”
Consumerization of IT
10xincrease every five years
85%from new data types
Dataexplosion
4.3connected devices per adult
27%using social media input
04.03.2014Big Data and the Lambda Architecture R 1.00
3
2013 © Trivadis
Team
BI
Corporate
BI
Big
Data
SQL SERVER 2012 Analysis Services
DAX SSRS Reporting Services TabularColumn Store Index Sharepoint Partitioning
VB Skript MDX SSIS SQL Server Data Tools
Integration Services TSQL UDM PowerPivot Maps
PowerView Office Excel Access SelfService BI PerformancePoint
PDW PolybaseSQLHive
HDInsight Fasttrack Appliances
ODBC OLE DB C# Web Services
AzureCloud xVelocity BISM SSAS
Personal
04.03.2014Big Data and the Lambda Architecture R 1.00
5
2013 © Trivadis
Big data solutions deal with complexities of:
VOLUME
(Size)
VARIETY
(Structure)
VELOCITY
(Speed)
Big Data
VALUE
Hadoop/HDInsight04.03.2014Big Data and the Lambda Architecture R 1.00
6
2013 © Trivadis
Data Complexity: Variety and Velocity
Terabytes
Gigabytes
Megabytes
Petabytes
Big Data Patterns
Hadoop/HDInsight04.03.2014Big Data and the Lambda Architecture R 1.00
7
2013 © Trivadis
Big Data and the Lambda Architecture R 1.00
Data sources Non-Relational Data
The modern data warehouse
04.03.2014
8
2013 © Trivadis
Business Critical
Data Warehouse
ETL
Sensor Data
Log Data
Automated
Data
Social
Networks
RFID Data
HDInsight
Sensor Data
Log Data
Automated
Data
Social
Networks
RFID Data
Tomorrows DW/BI Environment
04.03.2014Big Data and the Lambda Architecture R 1.00
10
2013 © Trivadis
HBase (column DB)
Hive Mahout
Oozie
Sqoop
HBase/Cassandra/Couch/
MongoDB
Avro
Zo
okeep
er
Pig FlumeCascadingR
Am
bari
HCatalog
Hadoop = MapReduce + HDFS
Distributed, scalable system on commodity hardware composed of:
HDFS—distributed file system
MapReduce—programming model
Others: HBase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper
What is Hadoop?
04.03.2014Big Data and the Lambda Architecture R 1.00
11
2013 © Trivadis
Machine
Learning
Graph
Processing
Distributed
Compute
Extract Load
Transform
Predictive
Analysis
Hadoop capabilities
04.03.2014Big Data and the Lambda Architecture R 1.00
12
2013 © Trivadis
A replacement for
Data Warehouse
A place to learn how to
code
C#A place for low latency
data
Hadoop is not…
04.03.2014Big Data and the Lambda Architecture R 1.00
13
2013 © Trivadis
Move HDFS into the warehouse before analysis
ETL
Hadoop ecosystem
Learn new
skills
SQL
Build
Integrate
Manage
Maintain
Support
Limitations: Analysis with Big Data todaySteep learning curve, slow and inefficient
04.03.2014Big Data and the Lambda Architecture R 1.00
14
2013 © Trivadis
Microsoft Business Intelligence (BI) • Hive ODBC Connectivity
• BI Tools for Big Data
Better on Windows and Azure • Active Directory
• System Center
• .Net Programmability
Microsoft Data Connectivity• SQL Server /
SQL Parallel Data Warehouse
• Azure Storage /
Azure Data Market
Collaborate with and Contribute to OSS• Collaborate with HortonWorks
• Provide improvements
and Windows support back to OSS
Microsoft Hadoop Vision
Hortonworks Founder and Architect Arun Murthy
"Microsoft is far ahead of everyone
else in terms of what they're
contributing back to the community"
04.03.2014Big Data and the Lambda Architecture R 1.00
15
2013 © Trivadis
04.03.2014Big Data and the Lambda Architecture R 1.00
Big Data Lambda Architecture
16
2013 © Trivadis
Big Data Lambda Architecture
• Batch layer• Stores master dataset
• Compute arbitrary views
• Speed layer• Fast, incremental algorithms
• Batch layer eventually overrides speed layer
• Serving layer• Random access to batch views
• Updated by batch layer
04.03.2014Big Data and the Lambda Architecture R 1.00
17
2013 © Trivadis
The Batch Layer
• Stores master dataset (in append mode)
• Unrestrained computation
• Horizontally scalable
• High latency
04.03.2014Big Data and the Lambda Architecture R 1.00
18
2013 © Trivadis
The Speed Layer
• Stream processing of data
• Stores a limited window of data
• Dynamic computation
04.03.2014Big Data and the Lambda Architecture R 1.00
19
2013 © Trivadis
The Serving Layer
• Queries the batch and real-time views
• Merges the results
04.03.2014Big Data and the Lambda Architecture R 1.00
20
2013 © Trivadis
Microsoft Lambda Architecture Support
04.03.2014Big Data and the Lambda Architecture R 1.00
21
2013 © Trivadis
Extremely large volume of unstructured web logs
Ad hoc analysis of logs to prototype patterns
Hadoop data cluster feeds large 24TB cube
Business users analyze cube data
E.g. STRUCTURED & UNSTRUCTURED DATA
04.03.2014Big Data and the Lambda Architecture R 1.00
22
2013 © Trivadis
Apache Hadoop SQL Server Analysis Service (SSAS)
Microsoft Excel and PowerPivot
Other BI Tools and Custom Applications
Hadoop Data
Third Party Database
SQL Server
Analysis Services (SSAS Cube)
+
Custom
Applications
SQL Server Connector (Hadoop Hive ODBC)
Staging Database
04.03.2014Big Data and the Lambda Architecture R 1.00
23