sas on hadoop - knowledge booster for partners · hadoop hive sas can treat hadoop just as any...

40
Company Confidential - For Internal Use Only Copyright © 2014, SAS Institute Inc. All rights reserved. SAS ON HADOOP KNOWLEDGE BOOSTER Michel Philippens Frederik Vandenberghe Edwin van Waes

Upload: others

Post on 30-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS ON HADOOP KNOWLEDGE BOOSTERMichel Philippens

Frederik VandenbergheEdwin van Waes

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

STRATEGIC GOALS AND OBJECTIVES 2015-2018

Copyright © 2014, SAS Institute Inc. All rights reserved. Some content is considered SAS company confidential

BY 2018• 50% of overall business = Partnerbusiness• 20% of overall business = Luxemburg• Teritory sales as growth engine for the future

THE CHALLENGE AHEAD

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

BREAK

IntroductionHADOOP

HADOOP: a new data

platform SAS on HADOOP

Early Customer Cases

BASE CAMP

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

INTRODUCTION HADOOP

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

Data NodesHead

HADOOP FUNDAMENTALS

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

HADOOP : AN IMPORTANT ENABLER FOR BIG DATA ANALYTICS

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SELECTION OF HADOOP PROJECTS

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

HADOOP DISTRIBUTOR PARTNERSHIPS

Intel recently invested $740 Million to buy 18%. Puts their value at

around the $4 Billion mark!

HP recently invested $50 Million to into Hortonworks to get a place on the board. Total investment now about $300 Million.

Big Teradata and SAP Partners!

Google Capital recently invested $80 Million to into MapR – they

gathered $110 million of investment in their last round! IBM InfoSphere BigInsights

Amazon Elastic MapReduce (EMR)

Pivotal HD

GE invested $105 Million In Pivotal

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

HADOOP AS A NEW DATA PLATFORM

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

IT AND BUSINESS DRIVERS FOR HADOOP

Hadoop as a Data Platform(standalone or as part of a broader ecosystem)

Hadoop as a component of the next generation of Business Analytics

.. to support innovative use cases.. to support an IT Transformation

TEXT

MANAGE DATA

EXPLOR

ED

ATA

DEVELOP MODELS

DEP

LOY

&

MO

NIT

OR

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

WHERE WE ARE TODAY?

Operational Data Sources

EDWData Mart

Data Mart

Analytic Mart

Analytic Mart

Some, IsolatedIslands of analytics

Traditional BI

Analytics

Traditional DWHNot agile

Purpose buildNot detailed enough

High CostSlow

New types of data

External & Unstructured

Data

Increasedappetite for analytics & innovation

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

WHERE ARE WE HEADING?

Operational Data Sources

EDWData Mart

Data Mart

Traditional BI

AnalyticsLow costAgile

(schema on read)High Performant

HADOOP seen as analytics enabler

Business can think « big » again

Copyright © 2014, SAS Institute Inc. All rights reserved. Some content is considered SAS company confidential

As new data store As additional input to the EDW

As foundation for BI & Analytics As staging layer

MULTIPLE DATA ARCHITECTURES

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

2012

2020

$50.2B

$1.5BHadoopMarket –Hardware, Software, Services

DER ELEPHANT WIRD GROSS WERDEN…..

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS ON HADOOP

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS FOR HADOOP VISION

To be the Analytic and Data Management solution of choice for Hadoop. 

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD

TEXT

MANAGE DATA

EXPLOR

ED

ATA

DEVELOP MODELS

DEP

LOY

&

MO

NIT

OR

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS & HADOOP – A TECHNOLOGICAL PERSPECTIVE

SAS FROMHadoop

Hive

SAS can treat Hadoop just as any other data source,

pulling data FROM Hadoop, when it is most convenient

SAS INHadoop

Score A Code AImpala

.

SAS WITHHadoop

HPA LASR

SAS can work WITHHadoop, lifting data in a purpose-built advanced

analytics in-memory environment;

SAS can work directly INHadoop, leveraging the distributed processing capabilities of Hadoop.

Continuity Transition & Evolution

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD

TEXT

MANAGE DATA

EXPLOR

ED

ATA

DEVELOP MODELS

DEP

LOY

&

MO

NIT

OR

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

DATA LOADER FOR HADOOP !

Point & Click User Menus

Little or no Hadoop experience needed

Self-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS® DATA INTEGRATION STUDIO

• Intuitive point-and-click designer tool for the developer• Metadata driven, enabling lineage and documentation• Push processing down to Hadoop for ELT execution

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS® DATA INTEGRATION

STUDIO

File transfer TO/FROM Hadoop.

SUBMIT EXPLICIT HIVEQL TO HADOOP SERVER.

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

DATA FEDERATION DEFINITION

• IMPROVES GOVERNANCE via centralized security and auditing

• INCREASES AGILITY by enabling quicker responses to reporting requests and more readily responding to business needs

• IMPROVES PERFORMANCE by caching commonly used queries

• SIMPLIFIES ACCESS via abstraction layer that hides sources of data from consumer

Enables seamless access to data

from multiple disparate sources

through integrated, virtualized data views – without

physically moving data.

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS® FEDERATION SERVER

HADOOP HIVE & IMPALA ACCESS

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

BOOSTING THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD

TEXT

MANAGE DATA

EXPLOR

ED

ATA

DEVELOP MODELS

DEP

LOY

&

MO

NIT

OR

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

To enable analytics in this changing environment, you need to:

bring the Analytics to the Data…

…and run it in-memory in a distributed mode

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS® LASR™ ANALYTICS SERVER ARCHITECTURE

Metadata

Mid-Tier

SAS In-Memory Server

Workspace Server

Co-Located Data Storage

Co-Located Data Storage

Co-Located Data Storage

SAS® LASR Analytic Server

LASR Cluster

HadoopRDBMS Nonrelational ERP unstructured PC Files

MEMORY

STORAGE

PROCESSING

DATASOURCES

Co-Located Data Storage

Co-Located Data Storage

Co-Located Data Storage

SAS® LASR Analytic Server

LASR Cluster

Co-Located Data Storage

Co-Located Data Storage

Co-Located Data Storage

SAS® LASR Analytic Server

LASR Cluster

Massively Parallel Processing (‘MPP’) in the context of SAS® Visual Analytics…

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

• The data is read directly from the source DataNodes and written to the target data/worker nodes in parallel.

• The data load process bypasses the Name/Root Node bottleneck on both sides.

Blade 0LASR Root Node

Blade 1LASR Worker Node

Blade 2LASR Worker Node

Blade NLASR Worker Node

Blade 0 Source HDFS NameNode

Blade 1 Source HDFS DataNode

Blade 2 Source HDFS DataNode

Blade N Source HDFS DataNode

LOADING DATA FROM HADOOP INTO SAS IN-MEMORY

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS IN-MEMORY ANALYTICS PLATFORM

In-Memory Analytics Platform

ApproachableAnalyticsExplorationReporting

Decision TreesForecasting

Approachable ModellingClassificationRegressionClustering

AdvancedAnalytics

Recommendation  EngineMachine learning, Text Analytics

Interactive Data PrepSAS

VisualAnalytics

SASVisual

Statistics

SASIn-MemoryStatistics

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

HOME LENDING USE CASE – IMPACT

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

THE ANALYTICS LIFECYCLE IN A BIG DATA WORLD

TEXT

MANAGE DATA

EXPLOR

ED

ATA

DEVELOP MODELS

DEP

LOY

&

MO

NIT

OR

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved.

SAS WITHIN THE HADOOP ECOSYSTEM : SUMMARY

Next-GenSAS® User

User Interface

Metadata

Data Access

DataProcessing

FileSystem

SAS® User

MPI Based

SAS® LASR™ AnalyticServer

SAS® High-Performance

Analytic Procedures

HDFS

Base SAS & SAS/ACCESS® to Hadoop™

SAS Metadata

Pig

Map Reduce

In-MemoryData Access

SAS® Visual Analytics

SAS®

Enterprise Miner™

SAS® Data Integration

SAS®

EnterpriseGuide®

Hive

SAS Embedded Process

Accelerators

SAS® In-Memory Statistics for

Hadoop

Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS @ HADOOP : KEY BENEFITS

Answer

Reduced Data Movement Massive Performance Boost Simplified IT Landscape

Processing where the data resides In-memory parallel processing for analytic workloads.

Analytical and Transactional processing on a single data store.

Operationalize Analytics to Transform Business

HadoopIn-Memory Data

SAS HPA Procedures

Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SOME EARLY USE CASES

Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.

ROGERS MEDIA MACY’S MEDIA

Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.

OTHER RECENT CASES

TelematicsInsurance Service Provider

Credit ScoringBank

Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.

Q&A

Copyr igh t © 2014, SAS Ins t i tu te Inc . A l l r igh ts reserved.

12.02.15 - Hadoop

05.03.15 - Data Integration & Management

11.03.15 - Customer Decision Hub

19.03.15 - Visual Analytics + Visual Statistics

Partner Knowledge Booster

Company Confidential - For Internal Use OnlyCopyright © 2014, SAS Institute Inc. All rights reserved. sas.com

QUESTIONS?