leveraging mainframe data for modern analytics

1Confidential

Leveraging Mainframe Data for Modern Analytics

2Confidential

Today’s Speakers

Jordan Martz, Director of Technology Solutions, Attunity

David Tucker, Director of Partner Engineering, Confluent

Keith Reid, Principal, Insights and Data: Client Engagement and Practice Leader, Capgemini

3Confidential

Agenda

• A quick history of the mainframe

• A quick history of data migration

• Attunity - data migration with CDC

• Confluent streaming platform powered by Apache KafkaTM

• Putting it all together• Use cases and Replicate demo

• Answering your questionsImage: © Mark Richards

4Confidential

History of the Mainframe

Big businesses with big needs required big computers. Demands increased just when “second generation” transistor-based computers were replacing vacuum-tube machines in the late 1950s, spurred developments in hardware and software. Manufacturers commonly built small numbers of each model, targeting narrowly defined markets.

Why are they called “Mainframes”?Nobody knows for sure. There was no mainframe “inventor” who coined the term.Probably “main frame” originally referred to the frames (designed for telephone switches) holding processor circuits and main memory, separate from racks or cabinets holding other components. Over time, main frame became mainframe and came to mean “big computer.”

Source: The Computer Museum

Source: © International Business Machines Corporation (IBM), 1965

5Confidential

Death of the mainframe?

What Became of Mainframes?“Mainframes will soon be extinct”, pundits have announced regularly. Yet nobody told the mainframes, which remain alive and well, the backbone of world banking and other business systems. Reliable and secure, mainframes are seldom in the limelight. But one probably approved your last ATM withdrawal or reserved your last airplane ticket.

Source: The Computer MuseumSource: © International Business Machines Corporation (IBM), 2001

6Confidential

A quick history of data movement

Source

Data Warehouse

Batch Source History

7Confidential

A quick history of data movement

Source

Data Warehouse

CDC

Source History

8Confidential

A quick history of data movement – the ODS

Source CDC Source ODS(latest view)

9Confidential

This all changes with streaming / big data platforms (eg Kafka and Hadoop)

Source CDC

Source History

History

In-Memory Analytics(latest view and events)

Point in Time End of Day

Data Lake

Streaming Platform CEP

10Confidential

So why does CDC work in a Big Data world?

Big Data likes volume and likes history• Storage isn't an issue• History helps machine learning

Re-creating any point in time is simple• 8 lines of Scala code simple

Easiest way to get data without large system performance impacts• Reduces concerns on data integration

Enables very rapid response to transactional events• Fraud detection and even consumer response becomes much simpler

© 2016 Attunity

Attunity Platform for Enterprise Data Management

Attunity Replicate Attunity Compose Attunity Visibility

Universal Data Availability Data Warehouse Automation Metrics Driven Data Management

Integrate new platforms

Automate ETL/EDW

Optimizeperformance and cost

On Premises / Cloud

Hadoop FilesRDBMS EDW SAP Mainframe

Attunity Replicate

© 2016 Attunity

Attunity Replicate

No manual coding or scripting

Automated end-to-end

Optimized and configurable

Hadoop

Files

RDBMS

EDW

Mainframe

• Target schema creation• Heterogeneous data type

mapping• Batch to CDC transition• DDL change propagation• Filtering• Transformations

Hadoop

Files

RDBMS

EDW

Kafka

© 2016 Attunity

Data replication and ingest made easy

© 2016 Attunity

Zero-footprint Architecture

Lower impact on IT

• No software agents on sources and targets for mainstream databases

• Replicate data from 100’s of source systems with easy configuration

• No software upgrades required at each database source or target

Hadoop

Files

RDBMS

EDW

Mainframe

• Log based• Source specific optimization

Hadoop

Files

RDBMS

EDW

Kafka

© 2016 Attunity

Heterogeneous – Broad support for sources and targets

RDBMS

OracleSQL ServerDB2 LUWDB2 iSeriesDB2 z/OSMySQLSybase ASEInformix

Data Warehouse

ExadataTeradataNetezzaVerticaActian VectorActian Matrix

HortonworksClouderaMapRPivotal

Hadoop

IMS/DBSQL M/PEnscribeRMSVSAM

Legacy

AWS RDSSalesforce

Cloud

RDBMS

OracleSQL ServerDB2 LUWMySQLPostgreSQLSybase ASEInformix

Data Warehouse

AWS RedshiftAzure SQL DWExadataTeradataNetezzaVerticaPivotal DB (Greenplum)Pivotal HAWQActian VectorActian MatrixSybase IQ

HortonworksClouderaMapRPivotal

Hadoop

MongoDB

NoSQL

AWS RDS/Redshift/S3Azure SQL Data WarehouseAzure SQL DatabaseGoogle Cloud SQLGoogle Cloud Dataproc

Cloud

Effective: 12/10/2015

Kafka

Message Broker

targets

sources

© 2016 Attunity

Real-time data migration of mainframe data

18Confidential

Confluent: Open source enterprise streaming built on Apache Kafka

Open Source ExternalCommercial

Confluent Platform

Monitoring

Analytics

Custom Apps

Transformations

Real-time Applications

…

CRM

Data Warehouse

Database

Hadoop

DataIntegration

Mainframe

Control Center Auto-dataBalancing

Multi-Data Center Replication 24/7 Support

Supported Connectors Clients Schema

RegistryREST Proxy

Apache Kafka

KafkaConnect

KafkaStreams

KafkaCore

Database Changes Log Events loT Data Web Events …

19Confidential

Stream Data isThe Faster the Better

Stream Data can beBig or Fast (Lambda)

Stream Data will beBig AND Fast (Kappa)

From Big Data to Stream Data

Apache Kafka is the Enabling Technology of this Transition

Big Data wasThe More the Better

Valu

e of

Dat

a

Volume of Data

Valu

e of

Dat

a

Age of Data

Job 1 Job 2

Streams

Table 1 Table 2

DB

Speed Table Batch Table

DB

Streams Hadoop

20Confidential

Apache KafkaTM ConnectEffective Streaming Data Capture

21Confidential

Apache KafkaTM Connect – Streaming Data Capture

JDBC

Mongo

MySQL

Elastic

Cassandra

HDFS

Kafka Connect API

Kafka Pipeline

Connector

Connector

Connector

Connector

Connector

Connector

Sources Sinks

Fault tolerant

Manage hundreds of data sources and sinks

Preserves data schema

Part of Apache Kafka project

Integrated within Confluent Platform’s Control Center

22Confidential

Kafka Connect Library of Connectors

* Denotes Connectors developed at Confluent and distributed with the Confluent Platform. Extensive validation and testing has been performed.

Databases

*

Datastore/File Store

*

Analytics

*

Applications / Other

23Confidential

Apache KafkaTM StreamsDistributed Stream Processing Made Easy

24Confidential

Architecture of Kafka Streams, a Part of Apache Kafka

KafkaStreams

Producer

Kafka Cluster

Topic TopicTopic

Consumer Consumer

Key benefits

• No additional cluster• Easy to run as a service

• Supports large aggregations and joins

• Security and permissions fully integrated from Kafka

Example Use Cases

• Microservices• Continuous queries

• Continuous transformations

• Event-triggered processes

25Confidential

Kafka Streams: the Easiest Way to Process Data in Apache Kafka™

Example Use Cases• Microservices

• Large-scale continuous queries and transformations

• Event-triggered processes

• Reactive applications

• Customer 360-degree view, fraud detection, location-based marketing, smart electrical grids, fleet management, …

Key Benefits of Apache Kafka’s Streams API

• Build Apps, Not Clusters: no additional cluster required

• Elastic, highly-performant, distributed, fault-tolerant, secure

• Equally viable for small, medium, and large-scale use cases

• “Run Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud

Your App

KafkaStreams

26Confidential

Architecture ExampleBefore: Complexity for development and operations, heavy footprint

1 2 3

Capture businessevents in Kafka

Must process events with separate, special-purpose clusters

Write resultsback to Kafka

Your Processing Job

27Confidential

Architecture ExampleWith Kafka Streams: App-centric architecture that blends well into your existing infrastructure

1 2 3a

Capture businessevents in Kafka

Process events fast, reliably, securely with standard Java applications

Write resultsback to Kafka

Your App

KafkaStreams

3b

External apps can directly query the latest results

AppApp

28Confidential

Putting it all togetherCDC with Attunity on Confluent Enterprise

29Confidential

Back to the high-level platform integration …

Mainframe CDC

Source History

History

In-Memory Analytics(latest view and events)

Point in Time End of Day

Data Lake

Streaming Platform CEP

30Confidential

… made real in Attunity / Confluent Data Flow

Topic Data Flow

• Attunity publishes DB changes to Kafka

• ”Raw” connectors (eg FileSink or HDFS) persist change records where needed

• K-Streams app reads CDC topic and transforms (as necessary) for other data systems.

• Sink connectors (JDBC or K-V as needed) persist that transformed data for other uses.

KafkaStreams

Producer

Kafka Cluster

Topic TopicTopic

Consumer Consumer

Data System Sink

Attunity Replicate

Raw Sink

31Confidential

Use Cases

Query off-load• Mainframe system accepts

operational updates• Attunity CDC publishes table

updates to Kafka• Certified Confluent

Connectors replicate tables to other data systems for read-only queries

Business ValueGreater analytics flexibility at lower cost, without disrupting operational system

Enhanced security• Mainframe audit trails

published to Kafka• Syslog and other access

events published to other topics

• Event correlation via LogStash or similar tools

Business ValueEnhanced threat detection and end-to-end work-flow auditing

Cross-system integration• K-Streams application

joins customer data from mainframe customer-specific mobile information

• External applications use interactive queries to leverage up-to-the-second customer state

Business ValueImproved customer engagement, more efficient marketing spend

Attunity Replicate Demo

33Confidential

Thanks !!!

Any Questions ?

References:• http://discover.attunity.com/knowledge-brief-leveraging-

mainframe-data-for-modern-analytics.html

• http://confluent.io/product/connectors

• https://www.capgemini.com/resources/video/transform-to-a-modern-data-landscape