big data in a relational world - kerry osborne's oracle...

46
Big Data in a Relational World Presented by: Kerry Osborne JPMorgan Chase – December, 2012

Upload: nguyendung

Post on 08-Mar-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Big Data in a Relational World Presented by: Kerry Osborne JPMorgan Chase – December, 2012

Page 2: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

3

whoami –

Never Worked for Oracle Worked with Oracle DB Since 1982 (V2) Working with Exadata since early 2010 Work for Enkitec (www.enkitec.com) (Enkitec owns an Exadata Half Rack – V2/X2) (Enkitec owns an Oracle Big Data Appliance) Exadata Book (recently translated to Chinese)

Blog: kerryosborne.oracle-guy.com Twitter: @KerryOracleGuy

Hadoop Aficionado

Page 3: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

4

What’s the Point?

Data Volumes are Increasing Rapidly Cost of Processing / Storing is High Scalability is Big Concern

And …

Page 4: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

5

Hadoop Is A Virus

* Stolen from Orbitz

Page 5: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

6

Google Trends

Page 6: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

7

Google Trends

Page 7: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

8

Google Trends

Page 8: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

9

Disjointed Presentation ???

Big Data Basics Oracle Stuff Architectures Integration Approaches Products Exadoop Case Study

Page 9: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

10

So What is “Big Data”

Not My Favorite Term Lot’s of Hype Not the Right Tool for Every Job

* Many describe it using 3 (or occasionally 4) V’s

Page 10: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

11

How Many V’s?

Volume Velocity Variety Value (Value Density)

Page 11: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

12

Well, How Did We Get Here?

Page 12: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

13

Website Growth

Page 13: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Google Stack

Google File System (GFS)

Map Reduce BigTable Chubby

Google Applications

Page 14: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

15

Open Source Hadoop Stack

Hadoop File System (HDFS)

Hadoop Map Reduce Hbase ZooKeeper

Applications Hive Pig

Page 15: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

16

GPFS Design Goals

•  Inexpensive Commodity Components – failure expected •  Optimize for Large Files •  High Bandwidth More Important than Low Latency •  Typical Workload - Write Once Read Many •  High Append Concurrency

Page 16: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

17

Map Reduce Design Goals

•  Provide Scalability - add more machines and it goes faster

•  Minimize Network Usage - they realized network resources are scarce - Move the Work to the Data!

•  Simplify Parallel Distributed Programming - hides the details of - parallelization - fault-tolerance - locality optimization - load balancing

Page 17: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Hadoop Meets Exadata

Oracle Open World – October, 2012

Presented by: Kerry Osborne

Hi

Page 18: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

19

Traditional RDBMS Architecture

DB Server

Storage

Compute work

Storage

Plumbing

Page 19: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

20

Traditional Oracle Architecture

Cache

Storage

dbwr lgwr etc…

workers

RAC

Block Mapper (ASM)

(SGA) work

Page 20: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

tasktracker tasktracker

21

HDFS/Hadoop Architecture

Name Node Job Tracker work

Storage

workers

datanode

Storage

workers

datanode

HA ?

Page 21: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

tasktracker tasktracker

22

HDFS/Hadoop Architecture HA ?

Block Mapper (namenode)

Job Tracker work

Storage

workers

datanode

Storage

workers

datanode

Page 22: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

23

Digression: Internode Communication

Page 23: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

24

Exadata Architecture RAC

Block Mapper (ASM)

Cache work

Storage

workers

Storage Node

Storage

workers

Storage Node

workers

Page 24: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

tasktracker tasktracker

25

HDFS/Hadoop Architecture HA ?

Block Mapper (namenode)

Job Tracker work

Storage

workers

datanode

Storage

workers

datanode

Page 25: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

26

Oracle + Hadoop Integration

Page 26: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

27

Obligatory Marketing Slide

Page 27: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

28

Oracle Big Data Appliance

Prebuilt Hadoop Stack in a Rack Engineered System Open Source Software Includes Cloudera Distribution

Page 28: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

29

Oracle Big Data Appliance

Page 29: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

30

BDA Software

Page 30: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

31

Top Secret Feature of BDA

Page 31: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

32

Integration Options

Many Ways to Skin the Cat

•  Fuse •  Sqoop •  Oracle Big Data Connectors

Page 32: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

33

Fuse – External Tables

Page 33: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

34

Sqoop (SQL-to-Hadoop)

•  Graduated from Incubator Status in March 2012 •  Slower (no direct path?) •  Quest has a plug-in (oraoop) •  Bi-Directional

Page 34: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

35

Oracle Big Data Connectors

Oracle Loader for Hadoop - OLH

Oracle Direct Connector for HDFS - ODCH

Oracle R Connector for Hadoop – ORHC

Oracle Data Integrator Application Adapter for Hadoop

Note:

All Connectors are One Way

Page 35: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

36

Oracle Data Integrator Application Adapter for Hadoop

ODIAAH ?

Page 36: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

37

Oracle R Connector for Hadoop (ORHC)

•  Provides ability to pull data from Oracle RDBMS •  Provides ability to pull data from HDFS •  Provides access to local file system •  Not really a loader tool •  Most useful for analysts

Page 37: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

38

Oracle Loader for Hadoop (OLH) •  Implemented as a MapReduce job (oraloader.jar) •  Saves CPU on DB Server •  Can convert to Oracle datatypes •  Can partition data and optionally sort it •  Online – direct into Oracle tables

•  Can load into Oracle via JDBC or OCI Direct Path •  Offline – generate preprocessed files in HDFS (DP format)

Page 38: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

39

Oracle Direct Connector for HDFS  (ODCH)

•  Uses External Tables •  Fastest - 12T per hour •  Can load DP files preprocessed by OLH •  Allows Oracle SQL to query HDFS data •  Doesn’t require loading into Oracle •  Downside – uses DB CPU’s

Page 39: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Exadoop

40

* Mad Scientist Project

Page 40: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Exadoop

41

Unusual Situation! Exadata Half Rack with 4 Spare Storage Servers Company Playing with “Big Data” Technology Exadata Cells Very Similar to BDA Servers 4 Cells ≈ Mini BDA! (happy face)

Page 41: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Exadoop Layout

42

- Exa Compute Nodes

-  Exa Storage Nodes (108TB raw)

-  Hadoop Cluster (144TB raw)

Big

Fat

Pip

e

- 4 Compute Nodes

-  7 Storage Nodes (252TB)

Exa Half Rack

X X X X X X X X X X X X

Exadoop

Page 42: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Exadoop Applications

43

Telecom Company Call Detail Records Dumped by Switches Loaded into HDFS via Flume

Page 43: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Exadoop – Proposed Architecture

44

- Exa Compute

- Exa Storage

- Hadoop Cluster

SIP Server

Flume Agent

CDR HDFS

Packet Sniffer

Hbase

Error Codes

Apex App

Java App

Page 44: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Exadoop Applications

45

Page 45: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

Wrap Up

46

Is Hadoop the right tool for the job?

Maybe

All the Cool Kids Are Doing It!

Page 46: Big Data in a Relational World - Kerry Osborne's Oracle Blogkerryosborne.oracle-guy.com/papers/Big Data in the Relational World... · Exadoop Case Study . 10 ... Kerry Osborne kerry.osborne@enkitec.com

47

Questions? Contact Information : Kerry Osborne

[email protected] kerryosborne.oracle-guy.com

www.enkitec.com