1 copyright © 2012, oracle and/or its affiliates. all...
Post on 03-Jul-2018
217 Views
Preview:
TRANSCRIPT
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 2
Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 3
Program Agenda
Big Data Connectors: Brief Overview
Connecting Hadoop with Oracle Database – Oracle Direct Connector for HDFS – Oracle Loader for Hadoop – Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 4
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 5
Acquire Organize & Discover Analyze
Visualize & Decide
Oracle’s Big Data Platform
Stream
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 6
Hadoop Oracle Database
Oracle’s Big Data Platform
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 7
Oracle Big Data Connectors
Oracle Direct Connector for HDFS
Oracle Loader for Hadoop
Oracle R Connector for Hadoop
Oracle Data Integrator Application Adapters for Hadoop
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 8
Oracle Loader for Hadoop and Oracle Direct Connector for HDFS
Access data resident on Hadoop from Oracle Database
Load data from Hadoop into Oracle Database
Analyze all data together: – Data processed on Hadoop along with data in Oracle Database
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 9
Oracle R Connector for Hadoop R Analytics leveraging Hadoop and HDFS
Linearly Scale a Robust Set of R Algorithms
Leverage MapReduce for R Calculations
Compute Intensive Parallelism for Simulations HDFS
Hadoop
Oracle R Client
MAP MAP MAP MAP
REDUCE REDUCE
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 10
Oracle Data Integrator Application Adapters for Hadoop
Transforms Via MapReduce(HIVE)
Loads
Activates
Benefits Consistent tooling across BI/DW, SOA, Integration and Big Data
Reduce complexities of processing Hadoop through graphical tooling
Improves productivity when processing Big Data (Structured + Unstructured) Oracle Database
Improving Productivity and Efficiency for Big Data
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 11
Big Data Connectors
ORACLE LOADER FOR HADOOP ORACLE DIRECT CONNECTOR FOR HDFS
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 12
Loading and Accessing Data from Hadoop
SHUFFLE /SORT
SHUFFLE /SORT
MAP
MAP
MAP
MAP SHUFFLE
/SORT
REDUCE
REDUCE
INPUT 2
INPUT 1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
Oracle Database
LOG FILES
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 13
BUSINESS PROBLEM Need insight into customer web activity (clickstream data)
CONNECT HADOOP WITH ORACLE DATABASE
Aggregate raw data and load into database for analysis
Example Use Case
BUSINESS PROBLEM Need to connect web activity with transactional activity
CONNECT HADOOP WITH ORACLE DATABASE
Perform analysis on in-place data by running Oracle SQL queries
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 14
Usage Scenarios
Bulk load large volumes of data – Example: Historical data, daily uploads of data gathered during the day
Loads at regular frequency – Example: 24/7 monitoring of log feeds
Loads at irregular frequency – Example: Monitoring of sensor feeds
Access data files in place on HDFS
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 15
Oracle Direct Connector for HDFS Accessing HDFS Data from Oracle Database
External Table
SQL Query
HDFS Client
HDFS Oracle Database Features
Access and analyze data in place on HDFS
Query and join data on HDFS with database resident data
Load into the database using SQL if required
Automatic load balancing to maximize performance
Access or load into the database in parallel using external table mechanism
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 16
Oracle Direct Connector for HDFS
Access data on HDFS via external tables – No DML operations, and no indexes can be created on external tables
Data files can be text files or Oracle Data Pump files (created by Oracle Loader for Hadoop)
Parallelism is controlled by the external table definition
Data files are grouped to distribute load evenly across PQ slaves
External Tables
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 17
Create external table Run the Oracle Direct
Connector for HDFS utility to publish HDFS content to the external table
Access and load into the database using SQL
3 Simple Steps
Oracle Direct Connector for HDFS
>hadoop jar \ $ODCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.hdfs.extab.ExternalTable\ -conf MyConf.xml \ -publish
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 18
Performance Comparison
0
1
2
3
4
5
6
Fuse-DFS Oracle Direct Connector for HDFS
Load rate (TB/hour)
Fuse DFS
0 20 40 60 80
100 120 140 160 180
Fuse-DFS Oracle Direct Connector for HDFS
CPU
sec
onds
use
d pe
r GB
CPU Usage
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 19
Key Benefits
Uniquely enables access to HDFS data files from Oracle Database Performance
– 12 TB/hour from Oracle Big Data Appliance to Oracle Exadata – 5x – 20x faster than comparable third party products
Easy to use for Oracle DBAs and Hadoop developers Developed and supported by Oracle
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 20
Oracle Loader for Hadoop
SHUFFLE /SORT
SHUFFLE /SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
Features
Offloads data pre-processing from the database server to Hadoop
Works with a range of input data formats
Handles skew in input data to maximize performance Online and offline modes (offline: create Oracle Data Pump files on HDFS)
Connect to the database from reducer nodes, load into database partitions in parallel (JDBC or direct path)
Read target table metadata from the database
Partition, sort, and convert into Oracle data types on Hadoop
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 21
Input Formats
Delimited text InputFormat Hive tables InputFormat Avro record InputFormat User written InputFormat (Planned) Regular expression InputFormat (Planned) Oracle NoSQL Database InputFormat
Oracle Loader for Hadoop
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 22
Automatically Handle Input Data Skew
Distribute load evenly across reduce tasks – All reducers do approximately the same amount of work – Avoids slowdown because of unbalanced reducer loads – Maximizes performance
Data is sampled to determine optimal partitioning of map output keys
Load Balancing across Reducers
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 23
Create target table
Submit Oracle Loader for Hadoop job to the cluster
2 Simple Steps
Oracle Loader for Hadoop
>hadoop jar \ $OLH_HOME/jlib/oraloader.jar \ oracle.hadoop.loader.OraLoader \ -conf MyConf.xml
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 24
Performance Comparison Third party products
0
0.5
1
1.5
2
2.5
Comparable third party product
Oracle Loader for Hadoop
Load rate (TB/hour)
0
100
200
300
400
500
600
700
Comparable third party product
Oracle Loader for Hadoop
CPU
sec
onds
use
d pe
r GB
CPU Usage
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 25
Key Benefits
Load directly from HDFS, Hive tables, … into Oracle Database without intermediate staging files
Performance – 10x faster than comparable third party products
Offload database server processing on to Hadoop – Minimizes impact on performance SLAs of production applications
Easy to use for Oracle DBAs and Hadoop developers Developed and supported by Oracle
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 26
Oracle Loader for Hadoop and Oracle Direct Connector for HDFS
SHUFFLE /SORT
SHUFFLE /SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
External Table
SQL Query
HDFS Client
Oracle Database
ORACLE DIRECT CONNECTOR FOR HDFS
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 27
• 12 TB / HOUR (66 BILLION ROWS)
• 5 – 20 TIMES FASTER THAN THIRD PARTY PRODUCTS
• REDUCED DATABASE CPU USAGE IN COMPARISON
Performance Summary
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 28
Summary
High performance connectors for load and access of data from a Hadoop cluster
Fast and efficient connectors support a range of use cases
Simple to set up, easy to use for developers Developed and supported by Oracle
top related