data integration for big data (oow 2016, co-presented with oracle)
TRANSCRIPT
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
CON6624Oracle Data Integration PlatformA Cornerstone for Big Data
Christophe Dupupet (@XofDup)Director | A-Team
Mark Rittman (@markrittman)Independent Analyst
Julien Testut (@JulienTestut)Senior Principal Product Manager
September, 2016
Confidential – Oracle Internal/Restricted/Highly Restricted
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential 1
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data integration for Big Data
Q & A
1
2
3
4
Five CoreCapabilities
1. Business ContinuityDATA ALWAYS AVAILABLE
2. Data MovementDATA ANYWHERE IT’S NEEDED
3. Data TransformationDATA ACCESSIBLE IN ANY FORMAT
4. Data GovernanceDATA THAT CAN BE TRUSTED
5. Streaming DataDATA IN MOTION OR AT REST
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 5
Eight Core Products
Cloud or On-Premise
MostInnovativeTechnology
#1#1
Realtime / StreamingData Integration Tool
Pushdown / E-LTData Integration Tool
1st to certify replication withStreaming Big Data
1st to certify E-LT tool withApache Spark/Python
1st to power Data Preparationw/ML + NLP + Graph Data
1st to offer Self-Service & Hybrid Cloud solution
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 7
Hybrid Open-Source...Open Source at the core of speed & batch processing engines
...Enterprise Vendor tools for connecting to existing IT system and
...Cloud Platforms for data fabricBusiness
DataServingLayer
Apps
Analytics
Batch Layer
Data Streams
Social and Logs
Enterprise Data
Highly AvailableDatabases
Pub / Sub
REST APIs
NoSQL
Bulk Data
Speed LayerRaw Data Stream Processing
Batch Processing
Prepared Data
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Examples
Oracle Confidential 8
Reference Architecture
BusinessData
ServingLayer
Apps
Analytics
Batch Layer
Data Streams
Social and Logs
Enterprise Data
Highly AvailableDatabases
Pub / Sub
REST APIs
NoSQL
Bulk Data
Speed Layer
GoldenGate
Data Preparation
Data Quality, Metadata Management & Business Glossary
Oracle Data Integrator
Active DataGuard
Comprehensive architecture covers key areas – #1. Data Ingestion, #2. Data Preparation & Transformation, #3. Streaming Big Data, #4. Parallel Connectivity, and #5. Data Governance –and Oracle Data Integration has it covered.
Dataflow ML
Stream Analytics
Connectors
Oracle GoldenGate
Realtime Performance
Extensible & Flexible
Proven & Reliable
Oracle GoldenGate provides low-impact capture, routing, transformation, and delivery of database transactions across homogeneous and heterogeneous environments in real-time with no distance limitations.
MostDatabases Data
EventsTransaction Streams
Cloud
DBs
Big Data
Supports Databases, Big Data and NoSQL:
* The most popular enterprise integration tool in history
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
ApplicationsApplications DatabusApplications Speed Layer
Batch Layer
Capt
ure
Trai
l
Rout
e
Deliv
er
Pum
p
Oracle Confidential 10
Streaming Analytics
Application
ServingLayer
RESTServices
VisualizationTools
ReportingTools
Data Marts
UserUpdates
DBMSUpdates
GoldenGate for Ingest
GG GG
Applications ServingLayer
Speed Layer
Batch Layer
Platforms
Self-Service
Better Recommendations
Built-in Data Graph Zero software to install, easy to use browser based interface
Better automation and less grunt work
for humans
Graph database of real-world facts used for enrichment
Oracle Data Preparation
ReportingApps
FilesETL
Oracle Data Preparation is a self-service tool that makes it simple to transform, prepare, enrich and standardize business data – it can help IT accelerate solutions for the Business by giving control of data formatting directly to data analysts.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 12
MONTHS of effort spent on each new dataset
PROGRAMERS writing scripts or complex ETL
DATA WRANGLING wastes time and money
“Big Data’s dirty little secret is that 90% of time spent on a project is devoted to preparing data… After all the preparation work, there isn’t enough time left to do sophisticated analytics on it…” Thomas H. Davenport
InternetLogs
UN
STRU
CTU
RED
STRU
CTU
RED
Discovery& Visualization
EnterpriseReporting
EnterpriseETL & DataIntegration
BUSINESS VALUEOPPORTUNITY
Weeks or Months
I want my data!!
BDP for Data Preparation
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integrator
Bulk Data Performance
Non Invasive Footprint
Future Proof IT Skills
Oracle Data Integrator provides high performance bulk data movement, massively parallel data transformation using database or big data technologies, and block-level data loading that leverages native data utilities
Bulk DataTransformation
Most Apps,Databases
& Cloud Bulk Data Movement
Cloud
DBsBig
Data
1000’s of customers –
more than other ETL tools
Flexible ELT workloads run
anywhere: DBs, Big Data, Cloud
Up to 2x faster batch processes
and 3x more efficient tooling
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 14
ODI for TransformationsETL Engines
Big Data Frameworks
Speed Layer
Batch Layer
ServingLayer
ApplicationsApplications DatabusApplications
Application
RESTServices
VisualizationTools
ReportingTools
Data Marts
UserUpdates
DBMSUpdates
Applications ServingLayer
Speed Layer
Batch Layer
Oracle Data Integrator
Spark Streaming
Spark SQLSqoop
ERP
Oozie
Pig
HiveLoaders
Kafka
NoSQL
OGG
SQL
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 15
No ETL engine is required
Separation of Logical and
Physical design
Physical exec on SQL, Hive, Pig, or
Spark
Runtime exec in Oozie or via ODI
Java Agent
Rich set of pre-built operators
User defined functions
Business Value of ODI: Only Tool with Portable Mappings
Business Friendly
Extreme Performance
Spatial Awareness
Oracle Stream Analytics
DB
Web / Devices
DataEvent Data & Transaction Streams
Downstream(eg; Hadoop)
DataEvent
Oracle Stream Analytics is a powerful analytic toolkit designed to work directly on data in motion – simple data correlations, complex event processing, geo-fencing, and advanced dashboards run on millions of events per second.
Innovative dual model for
Apache Spark or Coherence grid
Simple to use spatial and geo-fencing features an industry first
Includes Oracle GoldenGate for
streaming transactions
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Stream or Batch Data
Spark based Pipelines
ML-powered Profiling
Oracle Dataflow MLOracle Dataflow ML is big data solution for stream and batch processing in a single environment – Lambda based applications that can run streaming ETL for cloud based analytic solutions.
Batch and stream
processing at the same time
Machine learning guides users for data
profiling
Data movement across Oracle PaaS services
Most Apps,Databases
& Cloud
Bulk Data Movement
Streaming Data Cloud
DBsBig
Data
Big DataPipeline
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
from Devices
Batch Layer
Oracle Confidential 18
Streaming Data
ApplicationsApplications
Databus
Applications
Speed Layer ServingLayer
RESTServices
VisualizationTools
ReportingTools
Data Marts
Applications
ServingLayer
Speed Layer
Batch Layer
Oracle Stream Analytics
Oracle Dataflow ML
Oracle GoldenGate
Application
ApplicationsApplicationsDevices
from Databases
Business Glossary
End-to-End Lineage
100+ Supported Systems
Oracle Metadata ManagementOracle Metadata Management provides an integrated toolkit that combines business glossary, workflow, metadata harvesting and rich data steward collaboration features.
Supports Databases, Big Data, ETL Tools, BI Tools etc:
BI Report Lineage
Taxonomy Lineage
Data Model Lineage
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Data CatalogSpeed Layer
Batch Layer
ServingLayer
Oracle Confidential 20
OEMM for Data GovernanceApplicationsApplications DatabusApplications
Application
RESTServices
VisualizationTools
ReportingTools
Data Marts
UserUpdates
DBMSUpdates
Applications ServingLayer
Speed Layer
Batch Layer
KafkaGenerated Streaming
Generated ETL CodeSqoopOLTP Databases
HDFS Files
HCatalog
Hive
NoSQL
ETLTools
Data Warehouses
BI Models
ER Models
Oracle Enterprise Metadata Management
140+ Supported Tools
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 21
Eight Core Products
Cloud or On-Premise
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data integration for Big Data
Q & A
1
2
3
4
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Leverage Wide Range of Modern Analytic Styles
4 Business Patterns of Big Data Customer Adoption
Oracle Confidential, under Non-Disclosure 23
DBMS(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data Storage
1. Analytic Data Sandbox:– Stakeholder: Functional Line of Business (LoB)– Core Value: Faster access to business data, Faster
time to value on Analytics– Innovation: Schema-on-read empowers rapid data
staging and true Data Discovery2. ETL Offload:
– Stakeholder: Information Technology (IT)– Core Value: Cost avoidance on DW/Marts– Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage3. Deep Data Storage:
– Stakeholder: Risk / Compliance (LoB)– Core Value: High fidelity aged data– Innovation: SQL on Hadoop engines enable very low
cost, queryable data access4. Streaming:
– Stakeholder: Marketing (LoB) / Telematics (LoB)– Core Value: New Data Services or Higher Click Rates– Innovation: MPP capable streaming platforms
combined with modern in-motion analytics
Data FirstAnalytics
Model FirstAnalytics
In-MotionAnalytics
Streaming
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Discovery, Exploratory and Visualization Style Analytics• Oracle Endeca, Big Data Discovery• Tableau, Cliq, Spotfire• DataMeer etc
Business Intelligence, Reporting and Dashboard Style Analytics• Oracle BIEE, Visual Analyzer• Cognos, SAS, MicroStrategy• Business Objects, Actuate etc
Analytic Data Sandbox
Oracle Confidential, under Non-Disclosure 24
Analytic Data Sandbox:– Stakeholder: Functional Line of Business (LoB)– Core Value: Faster access to business data, Faster
time to value on Analytics– Innovation: Schema-on-read empowers rapid data
staging and true Data Discovery– Industries: All industries
Supports “Data First” Style of Analytics– No schema required– Staging data is simple and fast– Minimal data preparation required
(mainly for un/semi-structured data sets)
Typical Customer Data Types / Sets– Usually bringing in Structured Data from OLTP
(Primary data is their existing Application data)– Often bringing in Semi-Structured data
(Secondary data is clickstream, logs, machine data)– Business value is usually in the combination of the
various data sets and the improved speed of discovery
DBMS(on prem or cloud)
Sandbox
ETL Offload
Staging
Data FirstAnalytics
Model FirstAnalytics
Often the data flow may not require any ETL Tooling
Other data flows may still require ETL as a pipeline
BI Self Service
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Discovery, Exploratory and Visualization Style Analytics• Oracle Endeca, Big Data Discovery• Tableau, Cliq, Spotfire• DataMeer etc
Business Intelligence, Reporting and Dashboard Style Analytics• Oracle BIEE, Visual Analyzer• Cognos, SAS, MicroStrategy• Business Objects, Actuate etc
ETL Offload
Oracle Confidential, under Non-Disclosure 25
DBMS(on prem or cloud)
Sandbox
ETL Offload
Staging
2. ETL Offload:– Stakeholder: Information Technology (IT)– Core Value: Cost avoidance on DW/Marts– Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage– Industries: Teradata, Netezza & AbInitio customers
Supports “Model First” Style of Analytics– Schemas required
(for working areas, sources and targets)– Staging data requires modeled staging tables– Data preparation required (mapping data sets)
(un/semi-structured data sets require pre-parsing)
Typical Customer Data Types / Sets– Usually bringing in Structured Data from OLTP Apps
(Primary data is their existing Application data)– Occasionally adding new data types to EDW schema
(Secondary data is clickstream, logs, machine data)– Business value is usually tied to the “cost avoidance”
around escalating DW and ETL tooling costs
Data FirstAnalytics
Model FirstAnalytics
Primary Data Flow RequiresData Integration Tools
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Discovery, Exploratory and Visualization Style Analytics• Oracle Endeca, Big Data Discovery• Tableau, Cliq, Spotfire• DataMeer etc
Business Intelligence, Reporting and Dashboard Style Analytics• Oracle BIEE, Visual Analyzer• Cognos, SAS, MicroStrategy• Business Objects, Actuate etc
Deep Data Storage
Oracle Confidential, under Non-Disclosure 26
DBMS(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data Storage
3. Deep Data Storage:– Stakeholder: Risk / Compliance (LoB)– Core Value: High fidelity aged data– Innovation: SQL on Hadoop engines enable very low
cost, queryable data access– Industries: Insurance and Banking
Typically Deep Storage of Relational Data– Schemas required
(item detail records, not necessarily aggregates)– Archival can be “on the way in” as part of routine
loading, and also via “periodic” pruning from the EDW and data marts
Popular with SQL on Hadoop and Federation– Teradata Query Grid from Teradata/Aster– IBM BigSQL from Netezza/PureData– Oracle Big Data SQL from Exadata– Pivotal HAWQ from Greenplum– Cisco Composite Software also selling on this use
case (in addition to BI Virtualization)
Data FirstAnalytics
Model FirstAnalytics
Pattern mining
Compliance
Queryable Archive
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Streaming Big Data Analytics
Oracle Confidential, under Non-Disclosure 27
DBMS(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data Storage
4. Streaming:– Stakeholder: Marketing (LoB) / Telematics (LoB)– Core Value: New Data Services or Higher Click Rates– Innovation: MPP capable streaming platforms
combined with modern in-motion analytics– Industries: Automotive, Aerospace, Industrial
Manufacturing, some Energy/Oil & Gas
Decisions on Data Before it hits Disk– Data volume may be too high to persist all data
• Only save the important data– Data may be highly repetitive (sensor data)– Correlations may need to happen with very low
latency requirements based on LoB demand
Key Use Case for “Data Monetization”– Customers are standing up new Data Services (eg;
realtime equipment failure alerts and subscription based monitoring)
– “Connected Car” services from most car makers– Disaster preparedness centers – Energy/Aerospace
In-MotionAnalytics
Streaming
Other data flows may still require ETL as a pipeline
Data FirstAnalytics
Model FirstAnalytics
Pattern mining
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Some Common Themes Across Use Cases
Oracle Confidential, under Non-Disclosure 28
1. Nearly 100% Analytic Use Cases– Data Discovery directly in Hadoop– ETL Offloading for analytics in SQL DB– Deep Data Storage for analytics in SQL DB– Streaming Analytics for data before it hits disk – Lambda Arch
2. Nearly all the Data is Structured Data:– OLTP Sources: every customer starts with the trusted data sets
that already drive the majority of business value – App Data– New Sources: Clickstream Logs, Machine Data and other App
Exhaust all have “structure” even if they may not have schema3. Many more Sources are App/OLTP Sources:
– By Quantity of Sources: most customers have many (dozens or hundreds) of App/OLTP source they are bringing in
– By Volume: by quantity of data, the amount of Machine Data or Log data may often exceed the OLTP data sets
4. Mainframes Matter:– High Value App : most of the biggest customers bringing
mainframe (DB2/z, IMS, VSAM) data to Hadoop5. Multiple Projects / Programs using Hadoop:
– Larger Customers: most of the biggest customers have multiple Hadoop projects running in parallel, some are IT led (DW/ETL Offload) and others are LoB led (Discovery/Telematics)
6. Customers are Starting in Phases:– By Value: IT led vs. LoB led initiatives have different
characteristics – even if the “Lake / Reservoir” factors in as a long term goal, the initial phases are often quite small in scale
7. Size of Hadoop Clusters vary widely:– Investment Sizes Differ (by a lot): some “start” with mega-
commitments (1000’s of Nodes) and others start very small8. Commodity H/W Clusters Dominate:
– Commodity: for use cases designed to work across groups– Appliances: for use cases attached to a single project
9. Data Lakes as a Way to Handle Vendor Diversity:– Middleware for Data: bigger customers have DWs/DBs from
every vendor and >6+ different BI tools; Hadoop is becoming the “canonical” data platform to sit in between
10. Open Source Data Platform is a Strategic Priority:– Senior Stakeholder Feedback: as a design point priority for their
“next gen” it is becoming more important that Open Source has a central role to play in the enterprise data platform
11. Industry Clusters:– 1. Banking, 2. Insurance, 3. Manufacturing, 4. Media, 4. Retail
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data Integration for Big Data
Q & A
1
2
3
4
T : @markri t tman
THOUGHTS ON ORACLE DATA INTEGRATIONFOR BIG DATA - A PRACTITIONER'S VIEWMark Rittman, Oracle ACE Director
ORACLE OPENWORLD 2016, SAN FRANCISCO
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Oracle ACE Director, blogger + ODTUG member
• Regular columnist for Oracle Magazine
• Past ODTUG Executive Board Member
• Author of two books on Oracle BI
• Co-founder of Rittman Mead, now independent analyst
• 15+ Years in Oracle BI, DW, ETL + now Big Data
• Based in Brighton, UK
About the Presenter
31
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Every engagement and customer discussion has Big Data central to the project• Hadoop extending traditional DWs through scalability, flexibility, cost, RDBMS -compatibility
• Hadoop as the ETL engine driven by ODI Big Data KMs
• New datatypes and methods of analysis enabled by Hadoop schema-on-read
• Project innovation driven by machine learning, streaming, ability to store + keep *all* data
Big Data Technology Core to Modern BI Platforms
32
• And what is driving the interest in these projects…?
Data Reservoir
Oracle Data Visualization
Oracle Big Data Platform
Oracle Big Data DiscoverySafe & secure Discovery and Development
environment
Data sets and samples Models and
programs
Marketing /Sales Applications
Models
MachineLearning
Segments
Operational Data
Transactions
CustomerMaster ata
Event, Social + Unstructured Data
Voice + Chat Transcripts
Data Factory
OGG for Big Data 12c
Oracle Stream
Analytics
Data streamsODI12c
Raw Customer Data
Data stored in the original
format (usually files) such as SS7, ASN.1, JSON etc.
Mapped Customer Data
Data sets produced by mapping and transforming
raw data
OracleData
Preparation
Oracle Big Data Appliance Starter Rack + Expansion
• Cloudera CDH + Oracle software • 18 High-spec Hadoop Nodes with
InfiniBand switches for internal Hadoop traffic, optimised for network throughput
• 1 Cisco Management Switch• Single place for support for H/W + S/W
Oracle Big Data Appliance Starter Rack + Expansion
• Cloudera CDH + Oracle software • 18 High-spec Hadoop Nodes with
InfiniBand switches for internal Hadoop traffic, optimised for network throughput
• 1 Cisco Management Switch• Single place for support for H/W + S/W
Enriched Customer Profile
Modeling
Scoring
Infiniband
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman 33
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Data from all the sources will need to be integrated to create the single customer view• Hadoop technologies (Flume, Kafka, Storm) can be used to ingest events, log data
• Files can be loaded “as is” into the HDFS filesystem
• Oracle/DB data can be bulk-loaded using Sqoop
• GoldenGate for trickle-feeding transactional data
• But nature of new data sources brings challenges• May be semi-structured or unknown schema
• Joining schema-free datasets
• Need to consider quality and resolve incorrect, incomplete, and inconsistent customer data
The Big data Secret? IT’s all about Data Integration
35
Single Customer View
Enriched Customer Profile
M/L
“How”
Chat
“What”“Who”
“Why”
Data fromstructured +
schema-on-readsources needs
integrating
Requirespreparation +obfuscation
Streaming sources with
JSON payloads
Apply Schema to Raw and Semi-Structured Data
HeterogeneousEnterprise +Web sources
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Finding raw data is easy; then the real work needs to be done - can be > 90% of project
• Four main tasks to land, prepare and integrate raw data to turn it into a customer profile1. Ingest it in real-time into the data reservoir
2. Apply Schema to Raw and Semi-Structured Data
3. Remove Sensitive Data from Any Input Files
4. Transform and map into your Customer 360-degree profile
Landing, Preparing and Securing Raw Data is *Hard*
36
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Data enrichment tool aimed at domain experts, not programmers
• Uses machine-learning to automate data classification + profiling steps
• Automatically highlight sensitive data,and offer to redact or obfuscate
• Dramatically reduce the time requiredto onboard new data sources
• Hosted in Oracle Cloud for zero-install• File upload and download from browser
• Automate for production data loads
Oracle Big Data Preparation Cloud Service
37
Raw Data
Data stored in the original format (usually files) such
as SS7, ASN.1, JSON etc.
Mapped Data
Data sets produced by mapping and transforming
raw data
Voice + Chat Transcripts
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
Step 2: Apply Schema to Raw and Semi-Structured Data
38
NLPEmbedded Information in
unstructured text Entities
Embedded InformationNo reliable patterns
Invalid and missing dataSensitive data
Invalidemails
Stream fromAPIs, HTTP:Moderate
Batch Loadfrom files, DB:
Easy
Load raw text from blog entries,
reviews
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Automatically profile and analyse datasets
• Use Machine Learning to spot and obfuscate sensitive data automatically
Step 3: Remove Sensitive Data from Any Input Files
39
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Oracle Data Integration offers a wider set of products for managing Customer 360 data
• Oracle GoldenGate
• Oracle Enterprise Data Quality
• Oracle Data Integrator
• Oracle Enterprise Metadata Management
• All Hadoop enabled
• Works across Big Data,Relational and Cloud
Step 4 : Transform, Join + Map into Polyglot Data Stores
40
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Projects build yesterday using MapReduce today need to be rewritten in Spark• Then Spark needs to be upgraded to Spark Streaming + Kafka for real time…
• Upgrades, and replatforming onto the latest tech, can bring “fragile” initiatives to a halt
• ODI’s pluggable KM approach to big data integration makes tech upgrades simple
• Focus time + investment on new big data initiatives• Not rewriting fragile hand-coded scripts
Future-Proof Big Data Integration Platform
41
41
Discovery & Development LabsSafe & secure Discovery and Development environment
DataWarehouse
Curated data : Historical view and
business aligned access
ODIDesktop
Client
Big Data Management Platform
Data sets and samples Models and programs
Big Data Platform - All Running Natively Under Hadoop
YARN (Cluster Resource Management)
Hive + Pig(Log processing,
UDFs etc)
HDFS (Cluster Filesystem holding raw data)
Kafka + SparkStreaming
ApacheBeam?
Enriched Customer Profile
Modeling
Scoring
Spark(In-Memory
Data Processing)
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• Big data projects have had it “easy” so far in terms of data quality + data provenance• Innovation labs + schema-on-read prioritise discovery + insight, not accuracy and audit trails
• But a data reservoir without any cleansing, management + data quality = data cesspool
• … and nobody knows where all the contamination came from, or who made it worse
And the Next Challenge : Data Quality + Provenance
42
(C) Mark Ri t tman 2016 W: ht tp: / /www.r i t tman.co.uk T : @markr i t tman
• From my perspective, this is what makes Oracle Data Integration my Hadoop DI platform of choice
• Most vendors can load and transform data in Hadoop (not as well, but basic capability)
• Only Oracle have the tools to tackle tomorrow’s Big Data challenge:Data Quality + Data Governance• Oracle Enterprise Data Quality
• Oracle Enteprise Metadata Mgmt
• Seamlessly integrated with ODI
• Brings enterprise “smarts” toless mature Big Data projects
Data Governance : Why I Recommend Oracle DI Tools
43
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Presen-tationson:
Oracle Confidential 44
Data Integration Solutions Program - tinyurl.com/DISOOW16
DemoStations:
Hands-on labs:
OracleEnterprise Metadata
Management
OracleEnterprise
Data Quality
Oracle GoldenGate
OracleData
Integrator
OracleBig Data
PreparationCloud Service
OracleEnterprise
Data QualityHOL7466
Oracle GoldenGateDeep DiveHOL7528
ODI and OGGfor Big Data
HOL7434
Oracle Big DataPreparation
Cloud ServiceHOL7432
MiddlewareDemoground
- Moscone South
Big Data Showcase
- Moscone South
DatabaseDemoground
- Moscone South
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 45
Data Integration Solutions Program - tinyurl.com/DISOOW16
Monday, Sept 19• Oracle Data Integration Solutions – Platform Overview and Roadmap
[CON6619 ]• Oracle Data Integration: the Foundation for Cloud Integration [CON6620 ]• A Practical Path to Enterprise Data Governance with Cummins [CON6621]• Oracle Data Integrator Product Update and Strategy [CON6622]• Deep Dive into Oracle GoldenGate 12.3 New Features for the Oracle 12.2
Database [CON6555]
Tuesday, Sept 20• Oracle Big Data Integration in the Cloud [CON7472] • Oracle Data Integration Platform: a Cornerstone for Big Data [CON6624]• Oracle Data Integrator and Oracle GoldenGate for Big Data [HOL7434]• Oracle Enterprise Data Quality – Product Overview and Roadmap
[CON6627] • Self Service Data Preparation for Domain Experts – No Programming
Required [CON6630] • Oracle Big Data Preparation Cloud Service: Self-Service Data Prep for
Business Users [HOL7432] • Oracle GoldenGate 12.3 Product Update and Strategy [CON6631] • New GoldenGate 12.3 Services Architecture [CON6551] • Meet the Experts: Oracle GoldenGate Cloud Service [MTE7119]
Wednesday, Sept 21• Data Quality for the Cloud: Enabling Cloud Applications with Trusted Data
[CON6629] • Transforming Streaming Analytical Business Intelligence to Business
Advantage [CON7352]• Oracle Enterprise Data Quality for All Types of Data [HOL7466] • Oracle GoldenGate for Big Data [CON6632] • Accelerate Cloud On-Boarding using Oracle GoldenGate Cloud Service
[CON6633] • Oracle GoldenGate Deep Dive and Oracle GoldenGate Cloud Service for Cloud
Onboarding [HOL7528]
Thursday, Sept 22• Best Practices for Migrating to Oracle Data Integrator [CON6623] • Best Practices for Oracle Data Integrator: Hear from the Experts [CON6625]• Dataflow, Machine Learning and Streaming Big Data Preparation [CON6626] • Data Governance with Oracle Enterprise Data Quality and Metadata
Management [CON6628] • Faster Design, Development and Deployment with Oracle GoldenGate Studio
[CON6634] • Getting started with Oracle GoldenGate [CON7318] • Best Practice for High Availability and Performance Tuning for Oracle
GoldenGate [CON6558]
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Oracle Cloud Platform Innovation Awards
Meet the Most Impressive Cloud Platform Innovators
• Meet peers who implemented cutting-edge solutions with Oracle Cloud Platform
• Learn how you can Transform your Business
No registration or OpenWorld pass required to attend
Oracle PaaS Customer Appreciation Reception
Tuesday, Sep 20, 4:00 p.m. - 6:00 p.m.YBCA Theater | 701 Mission St
Meet the Most Impressive Cloud Platform Innovators
• FREE Appreciation Reception for all Oracle PaaS Customers directly following the Innovation Awards Ceremony
No OpenWorld pass is required to attend this reception
Tuesday, Sep 20, 6:00 p.m. - 8:30 p.m.YBCA Theater | 701 Mission St
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Connect with Oracle Data Integration
@OracleDI
Blogs.oracle.com/DataIntegration/
Oracle Data Integration
Oracle Data Integration
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data integration for Big Data
Q & A
1
2
3
4
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 49
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor StatementThe preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential 51