sas on hadoop - knowledge booster for partners · hadoop hive sas can treat hadoop just as any...
TRANSCRIPT
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS ON HADOOP KNOWLEDGE BOOSTERMichel Philippens
Frederik VandenbergheEdwin van Waes
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
STRATEGIC GOALS AND OBJECTIVES 2015-2018
Copyright © 2014, SAS Institute Inc. All rights reserved. Some content is considered SAS company confidential
BY 2018• 50% of overall business = Partnerbusiness• 20% of overall business = Luxemburg• Teritory sales as growth engine for the future
THE CHALLENGE AHEAD
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
BREAK
IntroductionHADOOP
HADOOP: a new data
platform SAS on HADOOP
Early Customer Cases
BASE CAMP
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
INTRODUCTION HADOOP
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
Data NodesHead
HADOOP FUNDAMENTALS
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
HADOOP : AN IMPORTANT ENABLER FOR BIG DATA ANALYTICS
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SELECTION OF HADOOP PROJECTS
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
HADOOP DISTRIBUTOR PARTNERSHIPS
Intel recently invested $740 Million to buy 18%. Puts their value at
around the $4 Billion mark!
HP recently invested $50 Million to into Hortonworks to get a place on the board. Total investment now about $300 Million.
Big Teradata and SAP Partners!
Google Capital recently invested $80 Million to into MapR – they
gathered $110 million of investment in their last round! IBM InfoSphere BigInsights
Amazon Elastic MapReduce (EMR)
Pivotal HD
GE invested $105 Million In Pivotal
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
HADOOP AS A NEW DATA PLATFORM
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
IT AND BUSINESS DRIVERS FOR HADOOP
Hadoop as a Data Platform(standalone or as part of a broader ecosystem)
Hadoop as a component of the next generation of Business Analytics
.. to support innovative use cases.. to support an IT Transformation
TEXT
MANAGE DATA
EXPLOR
ED
ATA
DEVELOP MODELS
DEP
LOY
&
MO
NIT
OR
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
WHERE WE ARE TODAY?
Operational Data Sources
EDWData Mart
Data Mart
Analytic Mart
Analytic Mart
Some, IsolatedIslands of analytics
Traditional BI
Analytics
Traditional DWHNot agile
Purpose buildNot detailed enough
High CostSlow
New types of data
External & Unstructured
Data
Increasedappetite for analytics & innovation
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
WHERE ARE WE HEADING?
Operational Data Sources
EDWData Mart
Data Mart
Traditional BI
AnalyticsLow costAgile
(schema on read)High Performant
HADOOP seen as analytics enabler
Business can think « big » again
Copyright © 2014, SAS Institute Inc. All rights reserved. Some content is considered SAS company confidential
As new data store As additional input to the EDW
As foundation for BI & Analytics As staging layer
MULTIPLE DATA ARCHITECTURES
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
2012
2020
$50.2B
$1.5BHadoopMarket –Hardware, Software, Services
DER ELEPHANT WIRD GROSS WERDEN…..
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS ON HADOOP
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS FOR HADOOP VISION
To be the Analytic and Data Management solution of choice for Hadoop.
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD
TEXT
MANAGE DATA
EXPLOR
ED
ATA
DEVELOP MODELS
DEP
LOY
&
MO
NIT
OR
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS & HADOOP – A TECHNOLOGICAL PERSPECTIVE
SAS FROMHadoop
Hive
SAS can treat Hadoop just as any other data source,
pulling data FROM Hadoop, when it is most convenient
SAS INHadoop
Score A Code AImpala
.
SAS WITHHadoop
HPA LASR
SAS can work WITHHadoop, lifting data in a purpose-built advanced
analytics in-memory environment;
SAS can work directly INHadoop, leveraging the distributed processing capabilities of Hadoop.
Continuity Transition & Evolution
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD
TEXT
MANAGE DATA
EXPLOR
ED
ATA
DEVELOP MODELS
DEP
LOY
&
MO
NIT
OR
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
DATA LOADER FOR HADOOP !
Point & Click User Menus
Little or no Hadoop experience needed
Self-Service UI HTML 5 Interface
Enables Self-Service approach to managing data in Hadoop environment
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS® DATA INTEGRATION STUDIO
• Intuitive point-and-click designer tool for the developer• Metadata driven, enabling lineage and documentation• Push processing down to Hadoop for ELT execution
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS® DATA INTEGRATION
STUDIO
File transfer TO/FROM Hadoop.
SUBMIT EXPLICIT HIVEQL TO HADOOP SERVER.
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
DATA FEDERATION DEFINITION
• IMPROVES GOVERNANCE via centralized security and auditing
• INCREASES AGILITY by enabling quicker responses to reporting requests and more readily responding to business needs
• IMPROVES PERFORMANCE by caching commonly used queries
• SIMPLIFIES ACCESS via abstraction layer that hides sources of data from consumer
Enables seamless access to data
from multiple disparate sources
through integrated, virtualized data views – without
physically moving data.
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS® FEDERATION SERVER
HADOOP HIVE & IMPALA ACCESS
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
BOOSTING THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD
TEXT
MANAGE DATA
EXPLOR
ED
ATA
DEVELOP MODELS
DEP
LOY
&
MO
NIT
OR
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
To enable analytics in this changing environment, you need to:
bring the Analytics to the Data…
…and run it in-memory in a distributed mode
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS® LASR™ ANALYTICS SERVER ARCHITECTURE
Metadata
Mid-Tier
SAS In-Memory Server
Workspace Server
Co-Located Data Storage
Co-Located Data Storage
Co-Located Data Storage
SAS® LASR Analytic Server
LASR Cluster
HadoopRDBMS Nonrelational ERP unstructured PC Files
MEMORY
STORAGE
PROCESSING
DATASOURCES
Co-Located Data Storage
Co-Located Data Storage
Co-Located Data Storage
SAS® LASR Analytic Server
LASR Cluster
Co-Located Data Storage
Co-Located Data Storage
Co-Located Data Storage
SAS® LASR Analytic Server
LASR Cluster
Massively Parallel Processing (‘MPP’) in the context of SAS® Visual Analytics…
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
• The data is read directly from the source DataNodes and written to the target data/worker nodes in parallel.
• The data load process bypasses the Name/Root Node bottleneck on both sides.
Blade 0LASR Root Node
Blade 1LASR Worker Node
Blade 2LASR Worker Node
Blade NLASR Worker Node
Blade 0 Source HDFS NameNode
Blade 1 Source HDFS DataNode
Blade 2 Source HDFS DataNode
Blade N Source HDFS DataNode
LOADING DATA FROM HADOOP INTO SAS IN-MEMORY
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS IN-MEMORY ANALYTICS PLATFORM
In-Memory Analytics Platform
ApproachableAnalyticsExplorationReporting
Decision TreesForecasting
Approachable ModellingClassificationRegressionClustering
AdvancedAnalytics
Recommendation EngineMachine learning, Text Analytics
Interactive Data PrepSAS
VisualAnalytics
SASVisual
Statistics
SASIn-MemoryStatistics
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
HOME LENDING USE CASE – IMPACT
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD
TEXT
MANAGE DATA
EXPLOR
ED
ATA
DEVELOP MODELS
DEP
LOY
&
MO
NIT
OR
Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.
SAS WITHIN THE HADOOP ECOSYSTEM : SUMMARY
Next-GenSAS® User
User Interface
Metadata
Data Access
DataProcessing
FileSystem
SAS® User
MPI Based
SAS® LASR™ AnalyticServer
SAS® High-Performance
Analytic Procedures
HDFS
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Pig
Map Reduce
In-MemoryData Access
SAS® Visual Analytics
SAS®
Enterprise Miner™
SAS® Data Integration
SAS®
EnterpriseGuide®
Hive
SAS Embedded Process
Accelerators
SAS® In-Memory Statistics for
Hadoop
Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.
SAS @ HADOOP : KEY BENEFITS
Answer
Reduced Data Movement Massive Performance Boost Simplified IT Landscape
Processing where the data resides In-memory parallel processing for analytic workloads.
Analytical and Transactional processing on a single data store.
Operationalize Analytics to Transform Business
HadoopIn-Memory Data
SAS HPA Procedures
Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.
OTHER RECENT CASES
TelematicsInsurance Service Provider
Credit ScoringBank
Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.
12.02.15 - Hadoop
05.03.15 - Data Integration & Management
11.03.15 - Customer Decision Hub
19.03.15 - Visual Analytics + Visual Statistics
Partner Knowledge Booster