or how to connect oracle and big data....oracle data integrator. odi vs traditional etl....

63
There and back again. or how to connect Oracle and Big Data. 1 MAY 15 & 16, 2019 CLEVELAND PUBLIC AUDITORIUM, CLEVELAND, OHIO WWW.NEOOUG.ORG/GLOC

Upload: others

Post on 21-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

There and back again. or how to connect Oracle and Big Data.

1

MAY 15 & 16, 2019

CLEVELAND PUBLIC AUDITORIUM, CLEVELAND, OHIO

WWW.NEOOUG.ORG/GLOC

Page 2: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Gleb Otochkin

Started to work with data in 1992

Area of expertise: ● Data Integration● Oracle RAC● Oracle engineered systems● Virtualization● Performance tuning● Big Data● Cloud technologies.

[email protected]@sky_vst

Senior Cloud Architect

2

Page 3: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

The Journey begins here. 3

Page 4: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

AGENDA

4

● Big Data?● Business cases.● Offload the data.● Back again.● Cloud.● QA

4

Page 5: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Big Data?

5

Page 6: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Big Data intro.

Big Data?In Texas we call it just data.

6

Page 7: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Col A Col B Col C Col A

Col ACol A Col B

Col A Col B Col A

Col ACol A Col D

Col A Col A Col B

Col ACol A Col B Col C

Stored as a ROW format

7

Something about data.Why do we talk about it and where it is coming from.

Page 8: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Something about data.Why do we talk about it and where it is coming from.

8

Columnar format

Col A Col A Col A Col B Col B

Col BCol A Col A Col B

Col A Col A Col B

Col BCol A Col A

Col A Col A Col B

Col BCol A Col A

Page 9: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

9

Col A Col B Col C Col A

Col ACol A Col B

Col A Col B Col A Col B1

Col ACol A Col D

Col B2

Col A Col A Col B

Col ACol A Col B Col C1

Unstructured

ColC2

Something about data.Why do we talk about it and where it is coming from.

Page 10: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

10

Block Block Block Block

Block Block Block Block

Block Block Block Block

Block Block Block Block

Buffer - In memory Data Processing

Processed data

Easy for GBs scale tables.

What about TB or PB ?

Something about data.Why do we talk about it and where it is coming from.

Page 11: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

SUBTITLE

Big Data ecosystem

VOLUME

ANALYSIS

STRUCTURE

PROCESSING

GROWS

COMPLEX

DISCOVERY

11

Page 12: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Some Big Data tools and terms. Some terms and tools in the Big Data:

● HDFS - Hadoop Distributed File System.● Apache Hadoop - framework for distributed storage and processing.● HBase - non-relational, distributed database.● Kafka - streaming processing data platform.● Flume - streaming, aggregation data framework.● Cassandra - open-source distributed NoSQL database.● Mongodb - open-source cross-platform document-oriented database.● Hive - query and analysis data framework on top of Hadoop.● Avro - data format widely used in BD (JSON+binary)

12

Page 13: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases

13

Page 14: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases.Operation activity on DB with rest on a Data Lake.

14

RDBMS

OLTP

RDBMS

OLTP

BD platformData

Preparation Engine

BI & Analysis

Page 15: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases.Operation activity on DB with Data in Big Data and BI on a RDBMS.

15

RDBMS

OLTP

RDBMS

OLTP

BD platformData

Preparation Engine

BI & AnalysisRDBMS

Page 16: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases.Operation activity on DB with Data in Big Data and BI on a RDBMS.

16

RDBMS

OLTP

RDBMS

OLTP

BD platform

BI & Analysis

ODI & BD SQL

Page 17: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases.Operation and BI activity on DB with main Data body in a Big Data

platform.

17

RDBMS

OLTP

RDBMS

OLTP

BD platforms

BI & Analysis

OGG BD SQLKafka

RDBMS

Page 18: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases.Operation and BI activity on DB with main Data body in a Big Data platform.

18

RDBMS

OLTP

RDBMS

OLTP

BD platforms

BI & Analysis

OGG Kafka

Stream processing

Page 19: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Business cases.Short summary

● Tools and methods:○ Data replication tools.

■ Oracle GoldenGate.○ Batch load tools.

■ Sqoop.○ Data Integration tools.

■ Oracle Data Integrator.○ Streaming.

■ Kafka, Spark○ Presenting Big Data to RDBMS:

■ Oracle Big Data SQL.■ Oracle Big Data

Connectors.

● Reasons:○ UNstructured data.○ Growing Data Volume.○ Data retention policy.○ Storage cost.○ New data formats.○ New API and options.

19

Page 20: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Offload the data.

Real time replication OLTP data to a Big Data platform.

20

Page 21: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data.

Oracle Goldengate:● Proc.

○ Real time streaming.○ No or minimal impact to the

source.○ Supports most of the data types.○ Database engine agnostic.○ Different BD targets.○ Enterprise support.

● Cons. ○ Licensing fee.○ Closed source code.

21

Why Goldengate?

Page 22: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data data. Replication to HDFS by Oracle GoldenGate.

22

App

App

App

App Database Tran Log

App

Oracle Goldengate

Oracle Goldengate

HDFS

HDFS

HDFS

HDFS

HDFS

OGG Trail

Page 23: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data - capture. GoldenGate - Source side.

● Oracle 11.2.0.4 and up● Archivelog mode.● Supplemental logging:

○ Minimal on DB level.○ On schema or table level.

● OGG user in database.

23

App

App

App

App Database Tran Log

App

Page 24: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data - transport. GoldenGate - Data Pump.

● OGG 12.2.0.1 and up.● BD adapters with replicat.● Supported BD targets:

○ HDFS○ Kafka○ Flume○ HBase○ Mongodb (from 12.3)○ Cassandra (from 12.3)

● Different formats.

24

Tran Log

Manager

OGG Trail

Extract

OGG Trail

Data Pump

Manager

Replicat

Page 25: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data - Target. To HDFS.

● OGG 12.2.0.1 and up.● BD adapters with replicat.● DML and DDL.● Supported BD targets:

○ HDFS○ Kafka○ Flume○ HBase○ Mongodb (from 12.3)○ Cassandra (from 12.3)

25

OGG Trail

Manager

Replicat

HDFS Client

HDFS

HDFS

HDFS

HDFS

HDFS

● With the latest OGG for BD:○ JDBC , Elasticsearch, Kinesis, Kafka Connect

...

Page 26: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data - Target. Event handlers.

● File Writer Handler.○ HDFS Event Handler.○ ORC Event Handler.○ Oracle Cloud Event

Handler.○ Parquet Event Handler.○ S3 Event Handler.

26

OGG Trail

Manager

Replicat

Staging file

Cloud

Cloud

Cloud

Cloud

Cloud

Event handler

ORCParquet

Page 27: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Handlers.

● List of target handlers. ○ BigQuery Handler.○ Cassandra Handler.○ Elasticsearch Handler.○ File Writer Handler.

■ HDFS Event Handler.■ ORC Event Handler.■ Oracle Cloud Event

Handler.■ Parquet Event Handler.■ S3 Event Handler.

○ Flume Handler.○ HBase Event Handler.○ HDFS Handler.○ JDBC Handler.

● List of target handlers (continued).

○ Kafka.○ Kafka connect.○ Kafka REST Proxy Handler.○ Kinesis Streams Handler.○ MongoDB Handler.○ Oracle NoSQL Handler.○ Microsoft Azure Data Lake.○ File Writer Handler.

● Source handler.○ Oracle GoldenGate Capture for

Cassandra.27

Page 28: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data. Data on the source and on the target.

● New columns:○ Operation type.○ Table name○ Local and UTC timestamp

orcl> select * from ggtest.test_tab_2;ID RND_STR USE_DATE 1 BGBXRKJL 02/13/16 08:34:19 2 FNMCEPWE 08/17/15 04:50:18

28

hive> select * from BDTEST.TEST_TAB_2;OKI BDTEST.TEST_TAB_2 2016-10-25 01:09:16.000168 2016-10-24T21:09:21.186000 00000000120000004759 1

BGBXRKJL 2016-02-13:08:34:19 NULLI BDTEST.TEST_TAB_2 2016-10-25 01:09:16.000168 2016-10-24T21:09:22.827000 00000000120000004921 2

FNMCEPWE 2015-08-17:04:50:18 NULL

Page 29: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data. Journal of changes instead of a state

29

Database HDFS

Insert row 1Row 1

I | Row 1

Insert row 2Row 2

I | Row 2

Insert row 3Row 3

I | Row 3

Update row 3Row 3U | Row 3

Delete row 2D | Row 2

Page 30: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data - supplemental logging.

Primary key for a source table in HBase.

orcl> alter table ggtest.test_tab_2 add constraint pk_test_tab_2 primary key (pk_id);

Table altered.

orcl> insert into ggtest.test_tab_2 values(9,'PK_TEST',sysdate,null);

30

● Having supplemental logging for all columns:○ Row id as concatenation of values for all columns.

● Adding a primary key and supplemental logging for keys:○ Row id is primary key.

Page 31: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data. Two cases. All columns supplemental logging and PK supplemental logging.

31

hbase(main):012:0> scan 'BDTEST:TEST_TAB_2'ROW COLUMN+CELL 7|IJWQRO7T|2013-07-07:08:13:52 column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52 7|IJWQRO7T|2013-07-07:08:13:52 column=cf:PK_ID, timestamp=1459275116849, value=7 7|IJWQRO7T|2013-07-07:08:13:52 column=cf:RND_STR_1, timestamp=1459275116849, value=IJWQRO7T 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:ACC_DATE, timestamp=1459278884047, value=2016-03-29:15:14:37 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:PK_ID, timestamp=1459278884047, value=8 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:RND_STR_1, timestamp=1459278884047, value=TEST_INS1 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:TEST_COL, timestamp=1459278884047, value=TEST_ALTER 9 column=cf:ACC_DATE, timestamp=1462473865704, value=2016-05-05:14:44:19 9 column=cf:PK_ID, timestamp=1462473865704, value=9 9 column=cf:RND_STR_1, timestamp=1462473865704, value=PK_TEST 9 column=cf:TEST_COL, timestamp=1462473865704, value=NULL

Page 32: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data.

Some notes about source:● JSON,Text,Avro(different types) and

XML formats.● Hive support through text format.● Trimming out leading or trailing

whitespaces.● Truncates only as DML.

● We are replicating a log of changes.● Only committed transactions.● “Passive” commit.● Different levels of supplemental

logging may lead to different captured data (compressed DML).

● DDL only with following DML.

32

and BD destination:

Page 33: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Other tools. What can be used as an alternative.

● Kafka using JDBC to pull data.● Custom CDC solution built using

JDBC and Java applications.● Custom solutions based on triggers

and AQ.

33

● Quest Shareplex.○ Replication to Kafka.○ Supports most of data types. ○ https://www.quest.com/landing

/shareplex/● Attunity replicate.

○ https://www.attunity.com/products/replicate/platform-support

Page 34: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

And what about Sqoop?

34

Page 35: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Batch offloading.

35

Page 36: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Batch offloading from Oracle to Big Data.

● Sqoop one of the oldest.● Works in both ways.● Gluent is more than a just

batch tool.● BD SQL Copy to Hadoop.

○ Using Oracle DataPump format

36

SQOOP HDFS

HDFS

HDFS

HDFS

HDFS

Database

Page 37: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Replication to Big Data. Sqoop import.

37

App

App

App

App Database

App

Sqoop HDFS

HDFS

HDFS

HDFS

HDFS

MapReduceJDBC

Hive

Generate MapReduce Jobs.Connects through JDBC. Good for batch processing.

Page 38: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Back again

Using data from Big Data in Oracle (and Oracle data in Big Data)

38

Page 39: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Happy Return.

39

Page 40: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data Connectors

● Datasource for Apache Hadoop.○ Hive access to Oracle Database

tables.○ Predicate pushdown and

partition pruning.○ Directly convert SQL data.○ Parallel access to data.○ Kerberos, SSL, Oracle Wallet.○ Write data back to Oracle.

Oracle Datasource for Apache Hadoop and SQL Connector for HDFS

● SQL Connector for HDFS○ SQL access to data in Hadoop.○ Partition-aware access of Hive.○ Supports parallel query and load.○ Kerberos security.○ Can use Oracle Data Pump files

in HDFS.

40

Page 41: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data Connectors

● Oracle Loader for Hadoop.○ Partition-aware load.○ Parallel load. ○ Supports Parquet, JSON, Text ...○ Offload data type conversion to

Hadoop.○ Kerberos security.

Oracle Loader for Hadoop and Oracle XQuery for Hadoop.

● Oracle XQuery for Hadoop.○ Integration with Oozie workflows,

Cloudera search and XML for Hive.

○ XQueries execute where the data is located.

○ Works with HDFS, Hive, or Oracle NoSQL Database.

○ Parallel XML parsing.○ Can use Oracle Loader for

Hadoop to write back to Oracle.

41

Page 42: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

From Big Data to Oracle.

42

Tool Filtering DataConversion

Query Offloadquery

Parallel

Sqoop Oracle Oracle Yes

ODBC gateway

Hadoop Oracle Yes Yes

Oracle loader for HDFS

Oracle Hadoop Yes

Oracle SQL Connector for HDFS

Oracle Oracle Yes Yes

BD SQL Hadoop Hadoop Yes Yes Yes

ODI KM KM Yes Yes

Gluent Hadoop Hadoop Yes Yes Yes

Page 43: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Data Integrator. ODI vs traditional ETL.

Intermediate staging and transformation

Source I

Source II

Target

Extract Transform Load

Load engineSource I

Source II Target

Extract TransformLoad

Transform

43

Page 44: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Data Integrator. Topology.

Weblogic

Staging schema

Oracle DB I

HDFS

Work repositor

y

Oracle DB II

CSV text

ODI Agent

Target schema

Master repositor

y ODI Studio 44

Page 45: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Data Integrator. ODI workflow.

Data model

Project I

Designer

Project II

Data model

Data model

Data model

Logical schema I

Logical view

Logical schema II

Prod Batch

Context

Prod Stream

PROD

Physical

PROD

Test

PROD StreamPROD

Stream

Test env

Test env

45

Page 46: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Data Integrator. ODI logical mapping

• Logical mapping is separated from physical implementation.

46Have a big monitor. It helps a lot!

Page 47: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Data Integrator. ODI physical mapping.

• Different physical mapping for the same logical map

• Different knowledge modules.

47The same advice - big monitor. It helps a lot!

Page 48: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

● Knowledge modules for most platforms.

● Uses filters, mappings, joins and constraints.

● Batch or event(stream) oriented integration.

● ELT:○ Extract.○ Load.○ Transform.

● Logical and physical models separation.● Knowledge modules:

○ RKM - reverse engineering.○ LKM - load.○ IKM - integration.

48

Oracle Data Integrator.

Page 49: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data SQL Architecture.

49

Oracle Database

Data node

CDH or HDP management server

Data node

BD SQL

Exadata technology for Big Data

BD SQL agent

BD SQL service

BD SQL agent

External table

Page 50: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data SQL. What it can do for us.

● Smart scan:● Storage indexes.● Bloom filters.

● Works with: ○ Apache Hive. ○ HDFS ○ Oracle NoSQL Database.○ Apache HBase

● Fully supports SQL syntax.● Predicate push down.

50

Page 51: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data SQL. Views and packages to use.

● Smart scan:● Storage

indexes.● Bloom filters.

51

orcl> select cluster_id,database_name, owner, table_name from all_hive_tables where database_name='bdtest';CLUSTER_ID DATABASE_NAME OWNER TABLE_NAME bigdatalite bdtest oracle test_tab_1 bigdatalite bdtest oracle test_tab_2

PROCEDURE CREATE_EXTDDL_FOR_HIVEArgument Name Type In/Out Default?------------------------------ ----------------------- ------ --------CLUSTER_ID VARCHAR2 IN DB_NAME VARCHAR2 IN HIVE_TABLE_NAME VARCHAR2 IN HIVE_PARTITION PL/SQL BOOLEAN IN TABLE_NAME VARCHAR2 IN PERFORM_DDL PL/SQL BOOLEAN IN DEFAULT TEXT_OF_DDL CLOB OUT

Page 52: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data SQL. Building the external table.

● The hive table can be queried in Oracle now.

52

CREATE TABLE test_tab_2 ( tran_flag VARCHAR2(4000), tab_name VARCHAR2(4000), ………..) ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS( com.oracle.bigdata.cluster=bigdatalitecom.oracle.bigdata.tablename=bdtest.test_tab_2) ) PARALLEL 2 REJECT LIMIT UNLIMITED

TRAN_FLAG ID RND_STR USE_DATE---------- ---------- ---------- --------------------I 1 BGBXRKJL 2016-02-13:08:34:19I 2 FNMCEPWE 2015-08-17:04:50:18I 1 BGBXRKJL 2016-02-13:08:34:19

Page 53: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data SQL. What it can do for us.

● Parallel FULL table access to the external table.

53

Page 54: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Big Data SQL. Features of BD SQL.

● Predicate Push Down.○ Partitioned Hive tables are pruned.○ Apache Parquet and ORC files.○ CTAS to external table.○ Oracle NoSQL Database.○ Apache HBase.

● Access drivers :○ ORACLE_HDFS○ ORACLE_HIVE

● Smart Scan for HDFS:○ Parallel processing of the

Hadoop.○ Reduces data movement.○ Returns much smaller result sets.

● Storage Indexes:○ Built automatically.○ Up to 32 columns.

54

Page 55: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Some other tools. What can be used as an alternative.

● Gluent Data Platform. ○ Can offload data to Hadoop.○ Allows query data of any data

source in Hadoop.○ Offload engine .○ Supports Cloudera, Hortonworks

or any Hadoop with Impala or or Hive SQL engine installed.

https://gluent.com/

55

Page 56: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Cloud.

56

Page 57: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Goldengate Cloud Service

● Easy setup.● Integrated with other Oracle Cloud

services.● Secure.● Highly customizable and flexible. ● Supports multiple types of targets.

OGG Cloud service.

57

Page 58: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Cloud. GoldenGate Cloud Service.

58

App

App

App

App Database Tran Log

App

Oracle Goldengate

Oracle Goldengate

KAFKA

KAFKA

KAFKA

KAFKA

KAFKA

OGG Trail

Page 59: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

Oracle Data Integration Platform Cloud

● Connect to data sources and prepare, transform, replicate, analyze, govern, and monitor data

● Analyze data streams.● Set up policies based on metrics to

receive notifications.● Manage all your data sources from a

single platform.

Data transformation, integration, replication, analysis, and governance..

● On-premises to Cloud.● Cloud to Cloud.● Cloud to On-premises.● On-premises to On-premises.

59

Page 60: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

60

App

App

App

App Database Tran Log

App

DIPC AgentOracle Goldengate

Oracle Goldengate

KAFKA

KAFKA

KAFKA

KAFKA

KAFKAOGG Trail

ODI

Data Preparation

Oracle Data Integration Platform Cloud

Page 61: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

61

Page 62: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

QA

62

Page 63: or how to connect Oracle and Big Data....Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging Source I and transformation Source II Target Extract Transform Load Source

THANK YOUEmail: [email protected]://eclipsys.ca/blog/@sky_vst

63