big data in azure
TRANSCRIPT
Big Data in AzureMatthew WinterAzure Global Black Belt31st August 2016
Big Data is Changing Traditional Data Warehousing
… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing”*
* Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)
Data Sources
OLTP
ERP
CRM LOB
ETL
Data Warehouse
BI and Analytics
Big Data is Driving Transformative Changes
Traditional Big Data
Relational Datawith highly modeled schema
All Datawith schema agility
Specialized Hardware
Commodity Hardware
Datacharacteristics
Costs
Culture Operational ReportingFocus on rear-view analysis
Experimentation leading to intelligent actionWith machine learning, graph, a/b testing
Big Data Introduces a Culture of ExperimentationTangerine instantly adapts to customer feedback to offer customers what they want, when they want it
“I can see us…creating predictive, context-aware financial services applications that give information
based on the time and where the customer is.”
Billy LoHead of Enterprise Architecture
Scenario Lack of insight for targeted campaigns Inability to support data growth
SolutionAzure HDInsight (Hadoop-as-a-service) with the Analytics Platform System (APS) enables instant analysis of social sentiment and customer feedback across digital, face-to-face and phone.
Result
• Reduced time to customer insight• Ability to make changes to campaigns or adjust
product rollouts based on real-time customer reactions
• Ability to offer incentives and new services to retain—and grow—its customer base
However, there are challenges to Big Data…
Obtaining skills and capabilities
Determining howto get value
Integrating with existing IT investments
*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
But, Microsoft has done it beforeWe needed to better leverage data and analytics to do more experimentation
So we:• Designed a data lake for everyone to put their
data into• Built tools approachable by any developer• Created machine learning tools for collaborating
across large experiment modelsResult:• Across Microsoft, ten thousand developers doing
experimentation leading to better insights
• Leading to growth in our Microsoft businesses:• Office productivity revenue (45%YoY)*• Intelligent Cloud (100% YoY)*• Bing search share doubles
2010 2011 2012 2013 2014 2015
Growth of data @ Microsoft
Windows
SMSG
LiveBing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores Commerce Risk
Skype
LCA
Exchange
Yammer
Peta
byte
s E
xaby
tes
* Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
Microsoft is now taking everything we’ve learned on this journey
and bringing it to our customers
Technology. Cost. Culture.
Big Data as a Cornerstone of Cortana Intelligence
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards & Visualizations
Cortana
Bot Framework
Cognitive Services
Power BI
Information Management
Event Hubs
Data Catalog
Data Factory
Machine Learning and Analytics
HDInsight (Hadoop / Spark)
Stream Analytics
Intelligence
Data Lake Analytics
Machine Learning
Big Data Stores
SQL Data Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
Azure HDInsightHadoop and Spark as a Service on Azure
Fully-managed Hadoop and Spark for the cloud100% Open Source Hortonworks data platformClusters up and running in minutes Managed, monitored and supported by Microsoft with the industry’s best SLAFamiliar BI tools for analysis, or open source notebooks for interactive data science63% lower TCO than deploy your own Hadoop on-premises**IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure
HDInsight”
Comprehensive Set of Managed Apache Big Data Projects
• Scale to petabytes on demand• Process unstructured and semi-structured data• Develop in Java, .NET, and more• Skip buying and maintaining hardware
• Deploy in Windows or Linux• Spin up an Apache Hadoop cluster in minutes• Visualize your Hadoop data in Excel• Easily integrate on-premises Hadoop clusters
Core Engine
BatchMap Reduce
ScriptPig
SQLHive
NoSQLHBase
StreamingStorm
In-MemorySpark
Azure Data Lake StoreA Hyper-Scale Repository for Big Data Analytics Workloads
Hadoop File System (HDFS) for the cloudNo limits to scaleStore any data in its native formatEnterprise-grade access control, encryption at restOptimized for analytic workload performance
Azure Data Lake StoreDistributed, parallel file system in the cloud Performance-tuned and optimized for analyticsNo fixed size limitsStores all data typesHighly available with local & geo redundant storageWebHDFS REST APISupported by leadingHadoop distrosRole-based securityLow latency and high throughput workloads
YARNHDFS
HDInsightAnalytics Service
Store
U-SQL
Clickstream
Sensors
Video
Social
Web
Devices
Relational
Applications
Azure Data Lake AnalyticsA new distributed analytics service
Distributed analytics service built on Apache YARNElastic scale per query lets users focus on business goals—not configuring hardwareIncludes U-SQL—a language that unifies the benefits of SQL with the expressive power of C#Integrates with Visual Studio to develop, debug, and tune code fasterFederated query across Azure data sourcesEnterprise-grade role based access control
Typical Azure Big Data Architecture
AzureAPI
Management
Backend Services
Data sources
Apps
Sensors and devices
Event Hubs
Machine Learning
HDInsight(Apache Spark)
Storage
Power BIStream Analytics
SQL Data Warehouse
Azure Data Factory & Azure Data Catalog
Highest availability guarantee in the industry for peace of mind
• Managed, monitored and supported by Microsoft
• Enterprise-leading SLA—99.9% uptime
• No IT resources needed for upgrades and patching
• Microsoft monitors your deployment so you don’t have to
*Applies to HDInsight only
99.9% SLA
Runs in the Most Datacenters Worldwide
Azure doubling compute
and storage every 6 months*Applies to HDInsight only
Central USIowa
West USCalifornia
East USVirginia
North Central USIllinois
South Central USTexas
Brazil SouthSao Paulo State
West EuropeNetherlands
China North*Beijing
China South*Shanghai
Japan EastTokyo, Saitama
Japan WestOsaka
East AsiaHong Kong
SE AsiaSingapore
Australia South EastVictoria
Australia EastNew South Wales
India CentralPune
North EuropeIreland
East US 2Virginia
Lower Total Cost of Ownership
• No hardware • Hadoop support included
with Azure support • Pay only for what you use• Independently scale
storage and compute• No need to hire
specialized operations team
• 63% lower total cost of ownership than on-premises**IDC study “The Business Value and TCO Advantage of Apache Hadoop in the
Cloud with Microsoft Azure HDInsight”
Recognized by Top Analysts
Forrester Wave for Big Data Hadoop Cloud• Named industry leader by
Forrester with the most comprehensive, scalable, and integrated platforms*
• Recognized for its cloud-first strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
Microsoft DataScience SummitGet hands-on with the latest cutting edge technologies with Big Data, Machine Learning and Open Source at the Microsoft Data Science Summit.
Hear from thought leaders, data scientists, engineers and customers solving real world problems, make expert connections to help you put these technologies to work for your business.
September 26-27, 2016Atlanta, GA
Register Now!aka.ms/microsoftdatasciencesummit
Target audience• Data Scientists • Big Data Engineers • Machine Learning Practitioners/Engineers• Data Science/Engineering Managers
Why attendReadiness with architectural guidance &
hands-on training to operationalize solutions at scale
Real world examples with how to apply machine learning & data science techniques to your business
Networking with the experts and the community to bring your data to life
© 2016 Microsoft Corporation. All rights reserved.