leveraging mainframe data for modern analytics
TRANSCRIPT
1Confidential
Leveraging Mainframe Data for Modern Analytics
2Confidential
Today’s Speakers
Jordan Martz, Director of Technology Solutions, Attunity
David Tucker, Director of Partner Engineering, Confluent
Keith Reid, Principal, Insights and Data: Client Engagement and Practice Leader, Capgemini
3Confidential
Agenda
• A quick history of the mainframe
• A quick history of data migration
• Attunity - data migration with CDC
• Confluent streaming platform powered by Apache KafkaTM
• Putting it all together• Use cases and Replicate demo
• Answering your questionsImage: © Mark Richards
4Confidential
History of the Mainframe
Big businesses with big needs required big computers. Demands increased just when “second generation” transistor-based computers were replacing vacuum-tube machines in the late 1950s, spurred developments in hardware and software. Manufacturers commonly built small numbers of each model, targeting narrowly defined markets.
Why are they called “Mainframes”?Nobody knows for sure. There was no mainframe “inventor” who coined the term.Probably “main frame” originally referred to the frames (designed for telephone switches) holding processor circuits and main memory, separate from racks or cabinets holding other components. Over time, main frame became mainframe and came to mean “big computer.”
Source: The Computer Museum
Source: © International Business Machines Corporation (IBM), 1965
5Confidential
Death of the mainframe?
What Became of Mainframes?“Mainframes will soon be extinct”, pundits have announced regularly. Yet nobody told the mainframes, which remain alive and well, the backbone of world banking and other business systems. Reliable and secure, mainframes are seldom in the limelight. But one probably approved your last ATM withdrawal or reserved your last airplane ticket.
Source: The Computer MuseumSource: © International Business Machines Corporation (IBM), 2001
6Confidential
A quick history of data movement
Source
Data Warehouse
Batch Source History
7Confidential
A quick history of data movement
Source
Data Warehouse
CDC
Source History
8Confidential
A quick history of data movement – the ODS
Source CDC Source ODS(latest view)
9Confidential
This all changes with streaming / big data platforms (eg Kafka and Hadoop)
Source CDC
Source History
History
In-Memory Analytics(latest view and events)
Point in Time End of Day
Data Lake
Streaming Platform CEP
10Confidential
So why does CDC work in a Big Data world?
Big Data likes volume and likes history• Storage isn't an issue• History helps machine learning
Re-creating any point in time is simple• 8 lines of Scala code simple
Easiest way to get data without large system performance impacts• Reduces concerns on data integration
Enables very rapid response to transactional events• Fraud detection and even consumer response becomes much simpler
© 2016 Attunity
Attunity Platform for Enterprise Data Management
Attunity Replicate Attunity Compose Attunity Visibility
Universal Data Availability Data Warehouse Automation Metrics Driven Data Management
Integrate new platforms
Automate ETL/EDW
Optimizeperformance and cost
On Premises / Cloud
Hadoop FilesRDBMS EDW SAP Mainframe
Attunity Replicate
© 2016 Attunity
Attunity Replicate
No manual coding or scripting
Automated end-to-end
Optimized and configurable
Hadoop
Files
RDBMS
EDW
Mainframe
• Target schema creation• Heterogeneous data type
mapping• Batch to CDC transition• DDL change propagation• Filtering• Transformations
Hadoop
Files
RDBMS
EDW
Kafka
© 2016 Attunity
Data replication and ingest made easy
© 2016 Attunity
Zero-footprint Architecture
Lower impact on IT
• No software agents on sources and targets for mainstream databases
• Replicate data from 100’s of source systems with easy configuration
• No software upgrades required at each database source or target
Hadoop
Files
RDBMS
EDW
Mainframe
• Log based• Source specific optimization
Hadoop
Files
RDBMS
EDW
Kafka
© 2016 Attunity
Heterogeneous – Broad support for sources and targets
RDBMS
OracleSQL ServerDB2 LUWDB2 iSeriesDB2 z/OSMySQLSybase ASEInformix
Data Warehouse
ExadataTeradataNetezzaVerticaActian VectorActian Matrix
HortonworksClouderaMapRPivotal
Hadoop
IMS/DBSQL M/PEnscribeRMSVSAM
Legacy
AWS RDSSalesforce
Cloud
RDBMS
OracleSQL ServerDB2 LUWMySQLPostgreSQLSybase ASEInformix
Data Warehouse
AWS RedshiftAzure SQL DWExadataTeradataNetezzaVerticaPivotal DB (Greenplum)Pivotal HAWQActian VectorActian MatrixSybase IQ
HortonworksClouderaMapRPivotal
Hadoop
MongoDB
NoSQL
AWS RDS/Redshift/S3Azure SQL Data WarehouseAzure SQL DatabaseGoogle Cloud SQLGoogle Cloud Dataproc
Cloud
Effective: 12/10/2015
Kafka
Message Broker
targets
sources
© 2016 Attunity
Real-time data migration of mainframe data
18Confidential
Confluent: Open source enterprise streaming built on Apache Kafka
Open Source ExternalCommercial
Confluent Platform
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
CRM
Data Warehouse
Database
Hadoop
DataIntegration
Mainframe
Control Center Auto-dataBalancing
Multi-Data Center Replication 24/7 Support
Supported Connectors Clients Schema
RegistryREST Proxy
Apache Kafka
KafkaConnect
KafkaStreams
KafkaCore
Database Changes Log Events loT Data Web Events …
19Confidential
Stream Data isThe Faster the Better
Stream Data can beBig or Fast (Lambda)
Stream Data will beBig AND Fast (Kappa)
From Big Data to Stream Data
Apache Kafka is the Enabling Technology of this Transition
Big Data wasThe More the Better
Valu
e of
Dat
a
Volume of Data
Valu
e of
Dat
a
Age of Data
Job 1 Job 2
Streams
Table 1 Table 2
DB
Speed Table Batch Table
DB
Streams Hadoop
20Confidential
Apache KafkaTM ConnectEffective Streaming Data Capture
21Confidential
Apache KafkaTM Connect – Streaming Data Capture
JDBC
Mongo
MySQL
Elastic
Cassandra
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of data sources and sinks
Preserves data schema
Part of Apache Kafka project
Integrated within Confluent Platform’s Control Center
22Confidential
Kafka Connect Library of Connectors
* Denotes Connectors developed at Confluent and distributed with the Confluent Platform. Extensive validation and testing has been performed.
Databases
*
Datastore/File Store
*
Analytics
*
Applications / Other
23Confidential
Apache KafkaTM StreamsDistributed Stream Processing Made Easy
24Confidential
Architecture of Kafka Streams, a Part of Apache Kafka
KafkaStreams
Producer
Kafka Cluster
Topic TopicTopic
Consumer Consumer
Key benefits
• No additional cluster• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully integrated from Kafka
Example Use Cases
• Microservices• Continuous queries
• Continuous transformations
• Event-triggered processes
25Confidential
Kafka Streams: the Easiest Way to Process Data in Apache Kafka™
Example Use Cases• Microservices
• Large-scale continuous queries and transformations
• Event-triggered processes
• Reactive applications
• Customer 360-degree view, fraud detection, location-based marketing, smart electrical grids, fleet management, …
Key Benefits of Apache Kafka’s Streams API
• Build Apps, Not Clusters: no additional cluster required
• Elastic, highly-performant, distributed, fault-tolerant, secure
• Equally viable for small, medium, and large-scale use cases
• “Run Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud
Your App
KafkaStreams
26Confidential
Architecture ExampleBefore: Complexity for development and operations, heavy footprint
1 2 3
Capture businessevents in Kafka
Must process events with separate, special-purpose clusters
Write resultsback to Kafka
Your Processing Job
27Confidential
Architecture ExampleWith Kafka Streams: App-centric architecture that blends well into your existing infrastructure
1 2 3a
Capture businessevents in Kafka
Process events fast, reliably, securely with standard Java applications
Write resultsback to Kafka
Your App
KafkaStreams
3b
External apps can directly query the latest results
AppApp
28Confidential
Putting it all togetherCDC with Attunity on Confluent Enterprise
29Confidential
Back to the high-level platform integration …
Mainframe CDC
Source History
History
In-Memory Analytics(latest view and events)
Point in Time End of Day
Data Lake
Streaming Platform CEP
30Confidential
… made real in Attunity / Confluent Data Flow
Topic Data Flow
• Attunity publishes DB changes to Kafka
• ”Raw” connectors (eg FileSink or HDFS) persist change records where needed
• K-Streams app reads CDC topic and transforms (as necessary) for other data systems.
• Sink connectors (JDBC or K-V as needed) persist that transformed data for other uses.
KafkaStreams
Producer
Kafka Cluster
Topic TopicTopic
Consumer Consumer
Data System Sink
Attunity Replicate
Raw Sink
31Confidential
Use Cases
Query off-load• Mainframe system accepts
operational updates• Attunity CDC publishes table
updates to Kafka• Certified Confluent
Connectors replicate tables to other data systems for read-only queries
Business ValueGreater analytics flexibility at lower cost, without disrupting operational system
Enhanced security• Mainframe audit trails
published to Kafka• Syslog and other access
events published to other topics
• Event correlation via LogStash or similar tools
Business ValueEnhanced threat detection and end-to-end work-flow auditing
Cross-system integration• K-Streams application
joins customer data from mainframe customer-specific mobile information
• External applications use interactive queries to leverage up-to-the-second customer state
Business ValueImproved customer engagement, more efficient marketing spend
Attunity Replicate Demo
33Confidential
Thanks !!!
Any Questions ?
References:• http://discover.attunity.com/knowledge-brief-leveraging-
mainframe-data-for-modern-analytics.html
• http://confluent.io/product/connectors
• https://www.capgemini.com/resources/video/transform-to-a-modern-data-landscape