simplifying big data - oracle · simplifying big data best prac*ces for on-premises and cloud...

53

Upload: others

Post on 20-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

SimplifyingBigDataBestPrac*cesforOn-PremisesandCloudArchitecturesCON8746

PaulKentVicePresidentBigData,SASJean-PierreDijcksMasterProductManagerBigData,Oracle

Presentedwith

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirecQon.ItisintendedforinformaQonpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfuncQonality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andQmingofanyfeaturesorfuncQonalitydescribedforOracle’sproductsremainsatthesolediscreQonofOracle.

3

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

•  OracleBigDataAppliance–  SimpletoDeploy–  FastoutoftheBox–  LowerCostthanDIYClusters

•  OracleBigDataCloudService–  Samearchitectureason-premises–  Highperformancedataprocessing–  EnablesBigDataSQLinOraclePublicCloud

4

•  OracleBigDataSQL–  OracleSQLacrossALLyourdata–  SecureaccesstodatainNoSQL,HadoopandOracleDatabase

–  Highperformanceonun-modeleddata

•  OracleExadata–  Bestpla]ormforallOracleDatabaseworkloads

–  Uniqueso`warethatmaximizesOracleDatabase

–  Standardize,opQmized,hardenedend-to-end

BigDataPla]ormopQonsfromOracle

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 5

100%UpwardCompaQbilitywithOn-PremisesEnablesCoexistenceandMigraQon

OracleCloud

CoExistenceandMigra*on

SameArchitecture

SameStandards

SameProducts

PrivateCloud

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

ProgramAgenda

RightEngine–RightJob

RightPla]ormfortheEngine

ConfiguringforMixedWorkloads

Real-WorldExampleswithSAS

1

2

3

4

6

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RightEngine–RightJob

7

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Asimplesetofcriteria

8

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Concurrency

ComplexQueryResponseTimes

SingleRecordRead/WritePerformance

BulkWritePerformance

PrivilegedUserSecurity

GeneralUserSecurity

GovernanceTools

SystemperTBCost

BackupperTBCost

SkillsAcquisiQonCost

RDBMS

NoSQLDB

Hadoop

PerformanceSecurityCost

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

High-levelTechnologyComparisonHDFS NoSQL RDBMS

DataType Chunk Record TransacQon

WriteType Synchronous EventuallyConsistent ACIDCompliant

DataPreparaQon NoParsing NoParsing ParsingandValidaQon

9

Ingest

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

High-levelTechnologyComparisonHDFS NoSQL RDBMS

DataType Chunk Record TransacQon

WriteType Synchronous EventuallyConsistent ACIDCompliant

DataPreparaQon NoParsing NoParsing ParsingandValidaQon

DRType SecondCluster NodeReplica SecondRDBMS

DRUnit File Record TransacQon

DRTiming Batch Record TransacQon

10

Ingest

DR

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 11

High-levelTechnologyComparisonHDFS NoSQL RDBMS

DataType Chunk Record TransacQon

WriteType Synchronous EventuallyConsistent ACIDCompliant

DataPreparaQon NoParsing NoParsing ParsingandValidaQon

DRType SecondCluster NodeReplica SecondRDBMS

DRUnit File Record TransacQon

DRTiming Batch Record TransacQon

ComplexAnalyQcs? Yes No Yes

QuerySpeed Slow FastforsimplequesQons Fast

#ofDataAccessMethods One(fulltablescan) One(indexlookup) Many(OpQmized)

Ingest

DR

Acces

AffordableScale LowPredictableLatency FlexiblePerformance

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RightPla]ormfortheEngine

12

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

BigDataApplianceX5-2Hardware

•  SunOracleX5-2LServers,witheach– 2*18coreIntelXeonE5-2699v3– 128GBDDR4Memory– 96TBSASDiskSpace

•  3InfiniBandswitchesforallinternalHadoopandSparktraffic

•  1CiscoManagementSwitch

Note:SupportforClouderaCDHDataHubincluded

13

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

HadoopHardware–PricingTrendswithBDA(List)

14

$-

$500.00

$1,000.00

$1,500.00

$2,000.00

$2,500.00

BDAX2-2 BDAX3-2 BDAX4-2 BDAX5-2

PriceperTB(diskraw)

PriceperCore

PriceforGB/Mem

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Storage•  DensedrivesprovideflexibilityaswellaslowcostperTB

•  HDFSsupportsdensedriveswithmorescalabilityenhancementsinCDH5.5

•  RecommendaQon:Storageischeap,ignoreitforsizing

CPU•  “Hadoop”nodesareincreasinglyrunningmixedworkloadsand“externalprocesses”:–  HDFS–  ETL&StreamProcessing– MachineLearning

•  RecommendaQon:SizeCPUspernodetowardsthehigh-endcorecounts

15

Memory•  Mixedworkloadsandexternalprocessesdoneedmemory

•  LargeDIMMsdosQllcommandapremiumin2-socketservers

•  RecommendaQon:Sizememorytothemid-spectrumduetohockeysQckeffects.Thisischanging!

HardwareProfilesforHadoop

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

HardwareProfilesforHadoop–Conclusion•  Stopworryingaboutdiskspace.Itischeapandavailable.

– SSD?OncerealdataQeringisavailable,orforNoSQLDatabases• Worryinsteadabout:

– Network•  Expensiveando`entheforgozenbozleneck•  Thinkaboutdataingestandtherequiredcapacity

– CPU• MixedworkloadswillrequiremoreCPUswithinthenode•  EveryonewantstoruncomputeatthesameQme

– Memory•  BeconsciousofactualneedsandrestricQons•  NotallSWcanmakeuseoflotsofmemory

16

Bewaryofmythsandlore.Thehardwareworldhaschanged…

Instead,dothemathandcalculatethebozlenecksandusagesyou

expect

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

VirtualizaQonProfilesforHadoop

17

Disk

CPU

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

VirtualizaQonProfilesforHadoop

18

Performance/DataStorage

Disk

CPU

VM1 VM2

Hyperviseroverheadistypicallylow,buttheIOoverheadhasimpact,whichiswhySR-IOVisappliedtoensurehighthroughput

Retain“directazacheddisks”forHadoopworkloads

Goal:Datapermanentlylivesinthe“Hadoopnodes”

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

VirtualizaQonProfilesforHadoop

19

Flexibility/Applica*onHos*ng

Disk

CPU

VM-C1 VM-C2

VM-S1 VM-S2

Drivetowardsflexibilityattheexpenseofperformance

Goal:Spinupenvironmentsandthedataquickly

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RecommendaQonsandFuture?•  Ifyoudon’tneedthem,don’tusethem

– RememberyouwillneedtoapplyOSupdatestoallVMs,andwhentheyhavedata,thismaynotbedesirable

• Don’tcreatemanysmallVMs,rathercreatelargeVMs– PayazenQontothenumberofVMsonaphysicalnode,andconsiderwhathappensifthephysicalnodegoesoffline(dataloss?)

•  Futures– Docker?

20

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

ConfiguringforMixedWorkloads

21

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

SharingandManagingResources

• Note:thisisnotabout“security/isolaQon”butinsteadaboutmulQpleworkloads(tenants)onasinglecluster– HenceweareignoringVMs

•  Twomainways(outsideofVM/Docker)ofprevenQngresourcegrabs:– LinuxControlGroups(cgroups)– YetAnotherResourceNegoQator(YARN)

22

PrivateCloud

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

YARN• CapacityScheduler

– DesignedformulQpletenants– Basedonresourcequeues– Enablesadministratorstodistributeresourcestoappsandusers

•  FairScheduler– DesignedtoaverageoutresourcesperapplicaQon

– Doessupportqueues,butasaminimumguarantee

– Lowermaintenance,butlesscontrol

23

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Whycgroups?•  YARNdoesrequiredeveloperstowriteapplicaQonsinaspecificway

– SomeQmesitisquickerandsimplertonotadheretoYARN

•  SomeQmestherequirementsarenotoverlycomplex– cgroupsenables“guarantees”by“hardparQQoning”ofthenodesattheLinuxlevel– cgroupsarestaQcbutcreateasimpletoadministerandrobustseparaQonofresources

24

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RealWorldExampleswithSAS

25

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

4Segments

•  SASandOraclePartnership• DataLakeDream• Customer#1–Telco(Asia)• Customer#2–Banking

•  Thingsyoumightnotknow…

26

Big Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies, Oracle Corporation Maureen Chew, Oracle Corporation Gary Granito, Oracle Corporation 488-2013

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS AND ORACLE

WORKING TOGETHER TO CREATE CUSTOMER VALUE

•  Joint R & D development and Product Management teams in Cary and Redwood Shores

•  Focus on driving SAS technology components to run natively in Oracle database

•  Joint performance engineering optimizations

•  Template physical architectures developed based on use-cases

•  Physically tested and benchmarked together

•  Reduction in physical effort •  Overall reduction in lifecycle

costs

•  Best Practice papers •  SAS and Oracle Engineers

provide joint "Sizing and Architecture Analysis and Design"

29 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

ORACLE ENGINEERED SYSTEMS FOR

SuperCluster ExaData ExaLogic Virtual

Compute

Appliance

Big Data

Appliance

Database

Backup, Recovery,

Logging Appliance

ZFS

Storage

Appliance

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

DataLakeDream

FutureState:DataLakePazern

DemandDepositsData

CreditApplicaQons

Data

CreditCardTransacQons

MortgageData

200+DataSources

ForecasQng

Profitability

AssetTracking

Others

FinanceClientScanning

OperaQonalExposure

Commercial

Consumer

Risk

Federal

SEC

ComplianceReporQng

Others

RegulatoryMoneyLaundering

CheckFraud

LoanFraud

Fraud

Datamart

Datamart

DataMart

SAS

Cognos

Cognos

QlikView

QueryTool

SASSAS

SAS

MathRackAnalysts

/ 3 Stage Refinement

/source /transform /data

/card

/checking

/mortgage

/customer

/product

33 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS BIG DATA ON BIG DATA APPLIANCE

•  Flexible Architectural options for SAS deployments •  Can run on Starter, Half and Full configurations

•  Optionally select nodes “N, N-1, N-2, …” for additional SAS Services such as SAS Compute Tier, SAS MidTier

•  Optionally select node subset “N, N-1, N-2, N-3, …) for more dedicated resources for SAS Analytic Compute Environment by shifting Big Data Appliance roles

•  Option to selectively add more memory on a per node basis depending on specific workload distribution

34 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Midtier

STARTER BDA

SAS Visual Analytics Metadata Server SAS Compute

SAS HPA Root Node

SAS VISUAL ANALYTICS, HIGH-PERFORMANCE ANALYTIC COMPUTE ENVIRONMENT CO-LOCATED WITH HADOOP

35 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Midtier

FULL RACK BDA

LASR Worker

17

HDFS Data 17

Metadata Server SAS Compute

SAS HPA Root Node

LASR Worker

18

HDFS Data 18

SAS VISUAL ANALYTICS, HIGH-PERFORMANCE ANALYTIC COMPUTE ENVIRONMENT CO-LOCATED WITH HADOOP

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Customer#1TelcoProvider(Asia)

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

RAS

NOC

SOC

RAS

CEM

Collection

EPQM

HR

Supply Chain

TELCO (ASIA) CHALLENGES THE SITUATION…

•  IncreasingneedforData•  Moredataheldatmoregranularlevels•  ExpansionofAnalyQcsusersintheenterprise

DI Temp Space

Staging CMDM

CI Temp Space VA Workspace EM Workspace ADHOC

(ADM/ABTs)

ADM / ABTs

XXXXX BCA

XXXXX CLM

PLDT Home

PLDT EICB

XXXXX Network

IRM// Fraud Finance

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) BIG DATA ARCHITECTURE IMPLEMENTATION APPROACH

In Hadoop Processing

ADM & ABT DI Jobs (ETL Hadoop Target)

ADM & ABT DI Jobs (ETL SAS Target) SAN Storage

ADM DI Jobs using Sqoop (Hive)

ABT DI Jobs (ELT Hadoop Target)

(Pass Through + Explicit SQL + Hive Optimization)

Phase 0

Phase 1

Phase 2 EXADATA

BDA - CLOUDERA HADOOP

ADM (Hive)

ABTs (Hive)

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) DATA MANAGEMENT ON

HADOOP SAMPLE JOB FLOW - ADM

Source Table Extract Data

(Implicit SQL to partitioned table)

Table Loader (Proc Append)

Target Table (Hive) Phase 1

Phase 2 Source Table Sqoop Data

(Oozie & Sqoop Import)

Target Table (Hive)

Source Table Extract Data

(Implicit SQL to partitioned table)

Table Loader (Proc Append)

Target Table (SAS)

Phase 0

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) DATA MANAGEMENT ON

HADOOP

OPTIMIZED USING HADOOP SQOOP+OOZIE VIA CUSTOM SAS DI SQOOP TRANSFORM

SubjectArea TableLoadDurationinhours(pre-optimization)

LoadDuration,inhours(post-optimization)

Improvement(inhours)

%Improvement

DeltaLoadAggregationLevel

DeltaLoadDataSize(inGB)

FactTable1 0.7000 0.2383 0.4617 65.95% DAILY 5.60FactTable2 1.9500 0.5031 1.4469 74.20% WEEKLY 18.50FactTable3 3.8167 1.1969 2.6197 68.64% MONTHLY 37.20FactTable4 6.0667 0.6214 5.4453 89.76% DAILY 7.70FactTable5 0.5333 0.0617 0.4717 88.44% DAILY 1.70FactTable6 0.2500 0.0528 0.1972 78.89% DAILY 0.73FactTable7 0.4833 0.1011 0.3822 79.08% WEEKLY 2.20FactTable8 0.9667 0.2597 0.7069 73.13% MONTHLY 6.30FactTable9 5.7000 0.0697 5.6303 98.78% DAILY 1.70FactTable10 0.9333 0.1094 0.8239 88.27% WEEKLY 4.10FactTable11 1.7333 0.2517 1.4817 85.48% MONTHLY 8.10FactTable12 2.0833 0.2033 1.8800 90.24% DAILY 10.10FactTable13 4.4000 0.8367 3.5633 80.98% WEEKLY 39.80FactTable14 8.1333 2.6169 5.5164 67.82% MONTHLY 104.80FactTable15 0.4833 0.0447 0.4386 90.75% DAILY 0.75FactTable16 1.1000 0.0842 1.0158 92.35% WEEKLY 1.80FactTable17 1.3500 0.1008 1.2492 92.53% MONTHLY 3.60FactTable18 0.3833 0.0950 0.2883 75.22% DAILY 2.40FactTable19 0.6833 0.3014 0.3819 55.89% WEEKLY 14.70FactTable20 4.8333 1.1042 3.7292 77.16% MONTHLY 50.90

DimensionTable1 8.1000 2.9511 5.1489 63.57% N/A 213.70

AnalyticDataMart(ADM)

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) DATA MANAGEMENT ON

HADOOP OUTCOMES

•  Reuse existing workflows •  Retarget outputs to Hadoop Friendly Formats •  Selectively upgrade processing to Hadoop Optimal

•  Batch Window Dragon Slayed! •  Infrastructure no longer challenged by explosive growth

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Customer#2Bank(Europe)

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

HPA/TKGridRoot Nodegbbdap01

HPA/TKGridWorkNode

gbbdap02

HPA/TKGridWorkNode

gbbdap03

NM/DNVA/VSSMP

Hadoop PRODNode (Node manager/Data Node)

HBASE/Others

SASClients

(EM, EG, SASStudio

) SAS/ACCESSTO HADOOP

NM/DNNM/DN NM/DN NM/DNNM/DN NM/DN NM/DN NM/DN

NM/DN

HPA/TKGridWorkNode

gbbdap04

HPA/TKGridWorkNode

gbbdap05

HPA/TKGridWorkNode

gbbdap06

HPA/TKGridWorkNode

gbbdap07

HPA/TKGridWorkNode

gbbdap08

HPA/TKGridWorkNode

gbbdap09

HPA/TKGridWorkNode

gbbdap20

HPA/TKGridWorkNode

gbbdap21

HPA/TKGridWorkNode

gbbdap22

HPA/TKGridWorkNode

gbbdap23

HPA/TKGridWorkNode

gbbdap24

HPA/TKGridWorkNode

gbbdap25

HPA/TKGridWorkNode

gbbdap26

HPA/TKGridWorkNode

gbbdap27

HPA/TKGridWorkNode

gbbdap19

SAS HPDMgbhpap01

SAS/ACCESSTO HADOOP

SSH

NM/DNNM/DN NM/DN NM/DNNM/DN NM/DN NM/DN NM/DN

NN

NM/DN

EP Embedded Process

EP EP EP EP EP EP EP EP EP

EP EP EP EP EP EP EP EPEP

BANK EUROPE REFERENCE DIAGRAM

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Data exploration at massive scale

Intuitive visual

analytics

SAS® VISUAL ANALYTICS EXPLORER

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Descriptive and Predictive Modeling

Model

comparison

Dynamic group-by processing

SAS® VISUAL STATISTICS

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

In-Memory Statistics for

Hadoop:

Programming interface for SAS model

development

SAS® IN-MEMORY STATISTICS FOR HADOOP

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

BANK EUROPE THINGS OF NOTE

•  1. SAS was not licensed for all nodes in the BDA

•  SAS EP jobs scheduled by YARN will have full visibility to data across the cluster •  SASHDAT (SAS High Performance Data Binary Format) needs data sets to stay on

the nodes licensed for SAS; some attention with Hadoop Balancer required.

•  2. SAS was not licensed for all cores on the nodes it was licensed for

•  SAS Licensing Posture has improved – you can license say 200 cores on a 1000 core cluster and control limits with CGROUPS and YARN

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Thingsyoumightnotknow

49

SPD Engine with Hadoop §  Support for running on MapR 4.0.2

§  Support for Code Accelerator

§  Enhanced WHERE pushdown: AND, OR, NOT, parenthesis, range operators and in-lists

§  Parallel write support can improve write performance up to 40%

§  Optionally uses Apache Curator/Zookeeper as a distributed lock server. No more physical lock files.

libname spdat spde '/user/dodeca' hdfshost=default;

50 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Traditional Grid Manager Grid

control node Compute nodes

Shared File System

Run SAS job

LSF

YARN Resource Manager

YARN NM

YARN NM

YARN NM

Local Disks

Shared File System

Run SAS job

SAS GRID MANAGER FOR HADOOP

Leverage the processing power of the cluster •  SAS runs completely in Hadoop

cluster •  YARN with Oozie prioritizes and

schedules SAS jobs in cluster •  All existing SAS Grid Manager

integrated (e.g. EM) capabilities •  Reduction of SAN storage •  No separate SAS compute tier

51 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

ORACLE ENGINEERED SYSTEMS FOR

SuperCluster ExaData ExaLogic Virtual

Compute

Appliance

ZFS

Storage

Appliance

Database

Backup, Recovery,

Logging Appliance

Big Data

Appliance

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 52