big data is hard…download.microsoft.com/download/5/9/e/59e3dfd2-abd0-49b4-ae7e... ·...
TRANSCRIPT
Traditionally, analytics have been over pre-defined structures
Data characteristics:
Questions answered with BI and visualizations:
Customer
Sales
Product
To innovate, new types of data and analytics are needed
Data characteristics:
Questions from exploratory analytics:
Data complexity: variety and velocity
Peta
byte
s
Customer
Sales
Product
Two Approaches to Analytics
Observation
Pattern
Theory
Hypothesis
What will happen?
How can we make it happen?
Predictive
Analytics
Prescriptive
Analytics
What happened?
Why did it happen?
Descriptive
Analytics
Diagnostic
Analytics
Top-Down
Confirmation
Theory
Hypothesis
Observation
ETL pipeline
Dedicated ETL tools (e.g. SSIS)
Defined schema
Queries
Results
Relational
LOB
Applications
Traditional business analytics process1. Start with end-user requirements to identify desired reports
and analysis
2. Define corresponding database schema and queries
3. Identify the required data sources
4. Create a Extract-Transform-Load (ETL) pipeline to extract
required data (curation) and transform it to target schema
(‘schema-on-write’)
5. Create reports. Analyze data
All data not immediately required is discarded or archived
7
Store indefinitely Analyze See resultsGather data
from all sources
Iterate
New big data thinking: All data has value
All data has potential value
Data hoarding
No defined schema—stored in native format
Schema is imposed and transformations are done at query time (schema-on-read).
Apps and users interpret the data as they see fit
8
Azure Data Platform
VPN
Gateway
Cloud
Gateway
EventHub
ExpressRoute
SQL Data Sync
Data
Management
Service
Data Factory
Logic Apps
Cloud Services
Worker Role
Stream Analytics
Azure Data
Catalogue
Azure Batch
Machine
Learning
PowerBI
Cortana
Analytics
Suite
On-Premises
VPN Device
On-Premises
File Data
IOT
Transactional
Data
Had
oo
pSQ
L
Device Data
Log Data
Ap
ps
Stream Data
iOS/
And
roid
MPLS
Enterprise
Data
MPP/A
PS
Data
Management
Gateway
DocDB
storage blob
storage table
storage queue
MySQL Database
Azure SQL Data
Warehouse
HDInsight (Hadoop)
Azure Data Lake
Azure SQL Database
Azure Data Lake
Introducing Microsoft Azure Data Lake
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Product Details
Azure Data Lake store
Azure Data Lake analytics service
Azure HDInsight
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
YARN
U-SQL
Analytics
ServiceHDInsight
HDFS
Store
Introducing Azure Data Lake Store
No fixed limits file size (PB file sizes)
Designed for diversity of analytic workloads
Accessible to all HDFS compliant analytic applications (Hortonworks, Cloudera, MapR)
Managed, monitored, and supported by Microsoft
Enterprise grade features around security, compliance & management
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Azure Data Lake Analytics Service
Distributed analytics service
Dynamically scales to meet your business needs
Productive day one with industry leading development tools (for novices & experts)
Analytics over all data (unstructured, semi-structured, structured)
U-SQL: simple and familiar, easily extensible
Hive coming soon
Built on open standards (YARN)
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Azure HDInsight becomes key part of Data Lake
Microsoft’s cloud Hadoop offering
100% open source Apache Hadoop
Fully managed and supported by Microsoft
Spark, Hive, Pig, Storm, HBase
Up and running in minutes with no hardware
.NET and Java skills
Deep integration to Visual Studio
99.9% Enterprise Service Level Agreement
Use Windows or Linux
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Azure HDInsight Includes Spark
Single execution model for multiple tasks (SQL queries, streaming, machine learning, and graph)
Processing up to 100x faster performance
Developer friendly (Java, Python, Scala)
BI tool of choice (Power BI, Tabelau, Qlik, SAP)
Notebook experience (Jupyter/iPython, Zeppelin)
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Azure HDInsight Includes Storm
Consumes millions of real-time events from a scalable event broker (ie. Apache Kafka, Azure Event Hub)
Performs time-sensitive computation
Output to persistent stores, dashboards or devices
Customizable with Java + .NET
Deeply integrated to Visual Studio
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Azure HDInsight Includes HBase
Columnar, NoSQL database
Runs on top of the Hadoop Distributed File System (HDFS)
Provides flexibility in that new columns can be added to column families at any time
ADL Store: IngressData can be ingested into Azure Data Lake Store from a variety of sources
Server logs
Azure Event Hub
Apache
Flume
Azure Storage Blobs
Custom programs
.NET SDK
JavaScript CLI
Azure Portal
Azure PowerShell
Azure Data Factory
Apache Sqoop
Azure SQL DB
Azure SQL DW
Azure tables
Table Storage
On-premises databases
SQL
20
ADL
Store
Built-in
copy service
ADL Store: EgressData can be exported from Azure Data Lake Store into numerous targets/sinks
Azure SQL DB
SQL
Azure SQL DW
Azure
Tables
Table Storage
On-premises databases
Azure Data Factory
Apache Sqoop
Azure Storage Blobs
Custom programs
.NET SDK
JavaScript CLI
Azure Portal
Azure PowerShell
21
Built-in
copy service
ADL
Store
Learn More
http://azure.microsoft.com/en-us/documentation/services/hdinsight/
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map/
http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-big-data
http://channel9.msdn.com/Shows/Data-Exposed
http://azure.microsoft.com/en-us/pricing/free-trial/