martin schneider offering leader, db2 analytics ...€¦ · ease of use (for programmers) §...

33
© 2016 IBM Corporation Concepts in Analytics Martin Schneider Offering Leader, DB2 Analytics Accelerator IBM Germany Research and Development GmbH

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation

Concepts in Analytics Martin SchneiderOffering Leader, DB2 Analytics Accelerator

IBM Germany Research and Development GmbH

Page 2: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation2

Disclaimer

© Copyright IBM Corporation 2015. All rights reserved.U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP ScheduleContract with IBM Corp.

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

IBM, the IBM logo, ibm.com, DB2, and DB2 for z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others.

Page 3: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation3

Gartner CIO Agenda: Analytics baut seinen Platz 1 der Technologie-Themen weiter aus

3Source: Gartner “Flipping to Digital Leadership – Insights from the 2015 Gartner CIO Agenda Report”, gartner.com/cioagenda

Page 4: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation4

Neue Beziehung zum Kunden

4

Accounts

Commerce

Interaction

Branding

Früher: “Ich habe ein Produkt –ich suche dafür einen Kunden”

Heute: “Ich habe einen Kunden –was braucht er am meisten?”

Accounts

Commerce

Interaction

Branding

Page 5: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation5

Analytics entscheidet über Profit oder Verluste

5Nutzung von IT als Geschäftsstrategie

80%

jährlicheZunahme des

Customer Lifetime Value für Firmenmit Engagement

Analytics

der Vermarkter senden

dasselbe Material an alle Kunden

der Firmen sind „höchst

zufrieden“ mit der

Bereitstellung von

Informationen für ihre Arbeit

Höhe des typischen

Bußgelds im Fall eines

Regulations-verstoßes einer

Bank

geschätzerVerlust durch

Betrug imGesundheits-

wesen

entgangenes Steuer-

aufkommen aufgrund von

Normverstößen

6% +7,6% 226 Mio $ 16% 100 Mio $

Page 6: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation6

Accelerating the Client’s Journey to Cognitive Analytics

Natural, Intuitive or Automated Interaction

Con

text

Spe

cific

Usa

ge

Opportunities to infuse cognition and collaboration in existing solutions and products for differentiation.

Win on Innovation

Compete on time to business value – through context specific data, methods, workflow.

Reasoning

Learning

Natural Language

Optimization

Rules

Predictive Modeling

Forecasting

Statistical Analysis

Alerts

Drilldown Query

Ad-hoc Reports

Standard Reports

Big Data Platforms

ECM

Information Integration

RDBMS

Page 7: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation7

§ The need for cognitive analytics is driven by the confluence of SoLoMo (Social, Local, Mobile), Big Data, and Cloud

Veracity Variety

Velocity Volume

Cognitive Systems

Cognitive Analytics in the Context of Big Data – Key Drivers

Page 8: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation8

Was istApache Spark?

Page 9: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation9

What’s Spark?Origin

Founding Sponsers: Google, Amazon, SAP, IBM

Sponsors: Adobe, Apple, Bosch, Cisco, Cray, EMC, Ericsson, Facebook, Huawei, Informatica, Intel, Microsoft, Netapp, Pivotal, VMWare.

Affiliates: many

2002 – MapReduce @ Google2006 – Hadoop @ Yahoo2010 – Spark paper UC Berkley2011 – Hadoop 1.0 GA2014 – Apache Spark top-level (most active)

Fast§ In-memory distributed computing and JVM threads§ Faster than MapReduce for some workloads

Ease of use (for programmers)§ Written in Scala§ Scala, Python and Java APIs§ Scala and Python interactive shells§ Runs clustered (Mesos,…), standalone or in cloud

General purpose§ Covers a wide range of workloads§ Provides a variety of complex analytics libraries

§ SQL, ML, Streaming, Graph

Logistic regression in Hadoop and Spark

Page 10: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation10

Apache Spark

Spark SQLRelationalOperators

Spark MLlibMachineLearning

Spark GraphXGraph

Processing

Spark StreamingReal-TimeStreaming

Spark CoreGeneral Execution Engine

YARN MESOS Standalone

HDFS / Cassandra / HBase / Parquet / ...

Java / Python / Scala / R Languages

Spark Libraries

Spark Core

Cluster Manager

Data Abstraction

Page 11: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation11© 2015 IBM Corporation 11

Apache Spark z/OSAvailable since Year End 2015 via Open Source

Securely Integrate OLTP and Business Critical DataIntegrate:• DB2 for z/OS, IMS,

VSAM, PDSE, Syslog, SMF, ...

• Remote (non-z) data on distributed servers, Hadoop, Oracle, ...

Page 12: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation12© 2015 IBM Corporation 12

Federated Analytics, Data in PlaceExample: Integration of Spark Analytics with Transaction Systems

Key Values:• Optimized access and z/OS

governed ‘in-memory’ capabilities for core business data, leveraging open source analytic frameworks

• Consistent analytic interfaces for SQL, Graph, Machine Learning across multiple data and system environments

• Leverage of emerging Spark skills and commercial solution ecosystem built on Apache Spark for fast ROI and agility

• Integration of analytics across core systems, social data, website information, etc.

Core data

Core transactions

z/OS Linux on Z

SMS gateway

Sentiment Analysis

Spark node

Spark node

Qualify candidate for promotion

offerSpark node

Spark node

CICS Banking Process:

•Process transaction•Score risk of fraud•Qualify & Initiatepromotion offer

z/OS

Apache Spark SQL

Initiate Offer

Twitter

Page 13: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation13

Spark empowers users to accelerate the insight economyData Scientist

Data EngineerApp Developer

“the convincer”

“the builder”“the thinker”

What they want to do:§ Identify patterns, trends, risks, and opportunities

in data § Discover new actionable insightsHow Spark can help: § Supports the entire data science workflow: from

data access and integration, to analysis, to visualization

§ Provides a growing library of machine learning algorithms

What they want to do:§ Bridge between the Data Scientist and the App Developer § Implement machine learning algorithms at scale§ Put the right data system to work for the job at handHow Spark can help: § Abstract data access complexity (Spark doesn’t care what your data

store is)§ Enables solutions at web-scale

What they want to do:§ Build applications that lever advanced

analytics in partnership with the data scientist and data engineer

§ Follow agile design methodologies§ Optimize performance and meet SLAs

How Spark can help: § Supports the top analytics app languages

such as Python and Scala§ Eliminates programming complexity with

libraries such as MLlib and simplifies DevOps

§ Makes it easy to embed advanced analytics into applications

13

Page 14: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation14

Use Case Scenarios

§ Use case and roles based approach to understand entry points and required integration between various components

§ Identified 4 core use case scenarios:

1. SQL on „Open Source“ data stores and data in DB2 for z/OS2. Using „Open Source“ for information integration purposes3. Leveraging „Open Source“ for data exploration, data mining, ML4. Performing SystemT & NLP type analytics on non-structured data

§ Use cases require different integration points and leverage different Spark capabilities, e.g. - Spark SQL- Spark MLlib

Page 15: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation15

Was ist einData Lake?

Page 16: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation16

Page 17: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation17

A group of repositories, managed, governed, protected, connected by metadata and provide

self service access The most important and distinguishing element of a Data Lake is Governance

Data Lake is not an Enterprise Data Warehouse or Ad Hoc Data Mart(s)

It is a methodology to build Analytical repositories over ALL data in a manner which:

• documents their contents (for re-use),• provides data lineage back to source systems, and• allows using the best tool for the job (repository, engine, UI, etc)

IBM’s Data Lake Terminology

Page 18: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation18

This...

...caneasilybecomethis

Started as a noble concept .. Data Lake .. resulted in Data Swamps

Page 19: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation19

IBM’sDataLake=EfficientManagement,Governance,ProtectionandAccess.

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake Repositories

IBM’s Data Lake

Page 20: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation20

Data Lake (System of Insight)

Information Management and Governance Fabric

Catalog

Self-ServiceAccess

EnterpriseIT Data

Exchange

Self-ServiceAccess

Data Lake Repositories

The Data Lake Sub-Systems

Page 21: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation21

Data Lake (System of Insight)

Information Management and Governance Fabric

Catalog

Self-ServiceAccess

EnterpriseIT Data

Exchange

Self-ServiceAccess

AnalyticsTeams

Governance, Risk andCompliance Team

InformationCurator

Line of BusinessTeams

Data LakeOperations

Enterprise IT

Other Data Lakes

Systems of Engagement

Data Lake Repositories

Systems of Automation

Systems of Record

New Sources

The Data Lake Users Supported

Page 22: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation22

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake RepositoriesEnterprise IT Data Exchange

Enterprise IT

Accelerator(Analytical)

DB2 for z/OS(Operational)

DataOut

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake RepositoriesEnterprise IT Data Exchange

Enterprise IT

Accelerator(Analytical)

DB2 for z/OS(Operational)

InformationServiceCalls

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake Repositories

Accelerator(Analytical)

DB2 for z/OS(Operational)

AnalyticsTeams

Enterprise IT

Data Lake

Information Management and Governance Fabric

Data Lake Services

DataInAccelerator(Analytical)

DB2 for z/OS(Operational)

Enterprise IT Data Exchange

Data Lake Repositories

AnalyticsTeams

Data Lake Deployment Patterns

1. DB2 for z/OS as a source

3. DB2 for z/OS as a consumer of insight

2. DB2 for z/OS as a data platformfor the data Lake

4. DB2 for z/OS as a downstream system

Page 23: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation23

Pattern 1: DB2 for z/OS as a Source

Enterprise IT

Data Lake

Information Management and Governance Fabric

Data Lake Services

DataIn

Accelerator(Analytical)

DB2 for z/OS(Operational)

Enterprise IT Data Exchange

Data Lake Repositories

AnalyticsTeams

Deployment:§ DB2forz/OSdatacopiedregularlyintoDataLake§ AnalyticmodelsbuiltinDataLakeOperational

HistoryRepositories

§ Analyticsdiscovery,explorationandmodelingconductedonDataLakedataplatform

§ NewanalyticalmodelsdeployedinAcceleratorforusebyzSystemsapplications

DataOut

Page 24: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation24

Pattern 2: DB2 for z/OS as a Data Lake Platform

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake Repositories

Accelerator(Analytical)

DB2 for z/OS(Operational)

AnalyticsTeams

Deployment:§ DB2forz/OSdatadefinedasoneoftheData

LakeDataPlatformsthroughschemasmappedtoDataLake’scatalog

§ MappedschemasinDataLakecatalogselectedfordiscovery,explorationandmodelingofdatainsandboxes

§ DataaccessofzSystemsdataenabledtopulldatadirectlyfromDB2z/OSorAcceleratorintosandboxes

§ DB2z/OSandAcceleratorinthisdeploymentlogicallysitinsidethedatalakeandprovideanalyticstoAnalyticteams,e.g.DataScientists.

Page 25: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation25

Pattern 3: DB2 for z/OS as a Consumer

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake RepositoriesEnterprise IT Data Exchange

Enterprise IT

Accelerator(Analytical)

DB2 for z/OS(Operational)

InformationServiceCalls

Deployment:§ SupportedDataLakeAPI’sorstoredprocedurescalledfromzSystemapplicationsto

accessadditionaldataandinsightgeneratedbyanalyticsrunningindatalake§ Requirement– DataLakemustsupportavailabilityrequirementsofzSystems

platform

Page 26: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation26

Pattern 4: DB2 for z/OS as a Downstream System

Data Lake

Information Management and Governance Fabric

Data Lake Services

Data Lake RepositoriesEnterprise IT Data Exchange

Enterprise IT

Accelerator(Analytical)

DB2 for z/OS(Operational)

DataOut

Deployment:§ InsightsfromDataLakeandselectedsupportingdatafedtoDB2z/OSorAccelerator§ OperationofzSystemsanddatalakedecoupledinthisdeployment§ zSystemsanalyticalinsightsconductedlocally,howevertheremaybeadelay

betweengeneratedinsightsintheDataLakeandpublishingitintheAccelerator

Page 27: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation27

Subject matter experts want access to their organization’s data to explore the content, select, control, annotate and access information using their terminology with an underpinning of protection and governance.

Data Scientists seeking data for new analytics models

Marketeer seeking data for new campaigns

Fraud investigator seeking data to understand the details of suspicious activity

• Day-to-day activity• Requiring ad hoc access to

a wide variety of data sources

• Supporting analysis and decision making

• Using the subject matter experts terminology

Providing the flexibility of spreadsheets that can scale to large volumes, a wide variety of information types whilst protecting sensitive information and optimizing data storage and provisioning.

Business Scenarios we see

Page 28: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation28

The hybrid computing platform on z Systems

Supports transaction processing and analytics workloads

concurrently, efficiently and cost-effectively

Delivers industry leading performance for mixed workloads

The unique heterogeneous scale-out platform in the industry

Superior availability, reliability and security

Cloud and Mobile Enabled

TransactionProcessing

AnalyticsWorkload

IBM’s DB2 for z/OS and DB2 Analytics AcceleratorA self-managing, hybrid workload-optimized database management system that runs every query workload in the most efficient way, so that each query is executed in its

optimal environment for greatest performance and cost efficiency

System z Hybrid Transaction/Analytical Processing

Page 29: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation29

Enterprise IT

Systems of Record:

ATM, Loan, Deposit, …

Systems of Insight:

Reporting, Analytics …

DB2 Analytics Accelerator

enables System of

Insight Analytics for:

Reporting

Operational Analytics

Quick Model Refresh

z Systems of Record

Real Time Alerts

Reporting

Real-Time Predictive Scoring

optimization

Key Decisions, Constraints, Goals?

Data Data

WhyzSystems?§ Minimizesoreliminatesdatamovementtootherplatformsforreporting.

§ Keepsecureddatawhereitoriginates

§ Significantlyreducesdatalatencytimes.

§ ExploitszSystemsvalueproposition(RAS).

§ Exploitsz13optimization

§ UnlimitedScalability

Supportstransactionalandoperationalanalyticsystemsonsameplatform.

Systems of Engagement:

Collaboration Systems and

Portals,e-Mail,Mobile

z Systems within the Data Lake

Page 30: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation30

The Data Lake: Subsystems with Apache Spark

Data Lake (System of Insight)

Information Management and Governance Fabric

Catalog: Management,Governance, Protection

Self-ServiceAccess

EnterpriseIT Data

Exchange

Self-ServiceAccess

Analytics Teams:Analytics, DWH, …

Data Governance, Riskand Compliance Team

InformationCurator

LoB Teams:Risk Modeling, Fraud Mgmt, …

Data LakeOperations

Enterprise IT

Other Data Lakes

Systems of Engagement:CC, e-Mail,

Touchpoints, Notes, …

Systems of Automation

Systems of Record:

ATM, Loan, Deposit, …

New Sources:Social Media,

Twitter, …

Data Usage

Data Lake Repositories

Hadoop(non-structured)

OtherRepositories

(e.g. DB2 LUW)

Teradata, ExadataDB2 for z/OS with

Accelerator(IMS, VSAM, …)

Systems of Insight:

Reporting, Analytics …

Page 31: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation31

The Data Lake: Subsystems with Apache Spark

Data Lake (System of Insight)

Information Management and Governance Fabric

Catalog: Management,Governance, Protection

Self-Service

Access –e.g. Spark

EnterpriseIT Data

Exchange

Self-ServiceAccess

Analytics Teams:Analytics, DWH, …

Data Governance, Riskand Compliance Team

InformationCurator

LoB Teams:Risk Modeling, Fraud Mgmt, …

Data LakeOperations

Enterprise IT

Other Data Lakes

Systems of Engagement:CC, e-Mail,

Touchpoints, Notes, …

Systems of Automation

Systems of Record:

ATM, Loan, Deposit, …

New Sources:Social Media,

Twitter, …

Data Usage

Data Lake Repositories

Hadoop(non-structured)

OtherRepositories

(e.g. DB2 LUW)

Teradata, ExadataDB2 for z/OS with

Accelerator(IMS, VSAM, …)

Systems of Insight:

Reporting, Analytics …

Page 32: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation32

Imperatives for implementing a successful Data Lake

• Reduce complexity of information supply chain, e.g.

§ Avoid data movement§ Simplify data transformation

§ Use in-DB transformation§ Use temporary tables structures

• Leverage state-of-the-art technology, e.g.

§ HW accelerators § Special-purpose appliances § In-memory processing

• Use federation techniques whenever possible, e.g.

§ Federated SQL queries, leaving data in place

§ Federated analytical processing, leaving data in place

• Adhere to innovative and novel BI/DWH concepts, e.g.

§ Limit number of materialized data marts and data cubes

§ Use aggregation on the fly§ Allow for agile usage patterns

1

3 4

2

These imperatives align well with the strengths of z Systems

Page 33: Martin Schneider Offering Leader, DB2 Analytics ...€¦ · Ease of use (for programmers) § Written in Scala § Scala, Python and Java APIs § Scala and Python interactive shells

© 2016 IBM Corporation33