oracle big data sql - create value with data

34

Upload: oracle-big-data

Post on 15-Jul-2015

827 views

Category:

Business


2 download

TRANSCRIPT

Oracle Big Data SQLCreate Value with Data

David TeszlerDirector, Big Data AnalyticsProduct Business GroupSeptember, 2014

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Oracle Confidential – Internal/Restricted/Highly Restricted 3

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Oracle Confidential – Internal/Restricted/Highly Restricted 4

Seize the Opportunity by Breaking Big Data Technical Barriers

Oracle Big Data SQL: Enabling Technology to Unify the Data Platform

Demonstration

1

2

3

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Big Data OpportunityTypical use cases in today’s world of fast exploration of big data

Financial Services

MoneyLaundering

PortfolioAnalysis

Tracking Stock

Market

Manufacturing

Supply Planning

Retail

ReturnsFraudBuying

Patterns

Session-ization

Telcos

MoneyLaundering

SIM Card Fraud

CallQuality

BigData

Slide - 5

Utilities

NetworkAnalysis

Quality Assessment

Fraud

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential | #BeyondBigData 6

CREATEVALUE

SILOS OF INNOVATION SYSTEMS OF RECORD

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential | #BeyondBigData 7

Enterprise Big Data Analytics ArchitectureEnabling you to Create Value from Data

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

BIG DATAMANAGEMENT

BIG DATAANALYTICS

BIG DATAAPPLICATIONS

BIG DATAINTEGRATION

CREATE VALUEFROM DATA

Streaming +Batch

Data Reservoir +Data Warehouse

Discovery +Business Analytics

Mobile +Web + On-device

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 8

Discover and predict, fast

Simplify access to all data

Secure andgovern all data

MAKING BIG DATA BUSINESS AS USUAL with Oracle Big Data Enabling your Organization

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Array of Technologies

9

Run the Business

Business transactions

Business analytics

RelationalHadoop

Change the Business

Data reservoirs

Exploit new analyses

NoSQL

Scale the Business

Fast simple data structures

Scale-out economically

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Barriers to Adoption of New Technologies

Confidential 10

INTEGRATION SKILLS SECURITY

Lack tools and

training to exploit Big Data

Adding Big Data to

existing

architecture is complex

No clear route to

governance or

enforcement

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Overcoming Barriers to Adoption of New Technologies

Confidential 11

INTEGRATION SKILLS SECURITY

EngineeredSystems

SQL onAll Data

Database Security on

All Data

SQL

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Management System

SOU

RC

ES

Oracle Database

Oracle IndustryModels

Oracle Advanced Analytics

Oracle Spatial & Graph

Big Data Appliance

Cloudera Hadoop

Oracle Big Data Discovery

Oracle NoSQL Database

Oracle R Advanced Analytics for Hadoop

Oracle Database

Oracle Advanced Security

Oracle Advanced Analytics

Oracle Spatial & Graph

Oracle Exadata

Oracle Big DataConnectors

Oracle DataIntegrator

B

Oracle Big Data SQL

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Analytics Challenge

13

Separate silos of information to analyze

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Analytics Challenge

14

Separate data access interfaces

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15

SQL on Hadoop is Obvious

Stinger

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Analytics Challenge

16

No comprehensive SQL interface

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Management System

17

Preserving investment in SQL for Big Data analytics

NoSQL

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Snapshot of Oracle SQL Analytic Functions

Use Rich Oracle SQL Dialect Over All Data

• Ranking functions

– rank, dense_rank, cume_dist, percent_rank, ntile

• Window Aggregate functions (moving and cumulative)

– Avg, sum, min, max, count, variance, stddev, first_value, last_value

• LAG/LEAD functions

– Direct inter-row reference using offsets

• Reporting Aggregate functions

– Sum, avg, min, max, variance, stddev, count, ratio_to_report

• Statistical Aggregates

– Correlation, linear regression family, covariance

• Linear regression

– Fitting of an ordinary-least-squares regression line to a set of number pairs.

– Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions

• Descriptive Statistics

– DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values

• Correlations

– Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).

• Cross Tabs

– Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa

• Hypothesis Testing

– Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA

• Distribution Fitting

– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

} else {

next = lineNext.getQuantity();

}

if (!q.isEmpty() && (prev.isEmpty() || (eq(q, prev) && gt(q, next)))) {

state = "S";

return state;

}

if (gt(q, prev) && gt(q, next)) {

state = "T";

return state;

}

if (lt(q, prev) && lt(q, next)) {

state = "B";

return state;

}

if (!q.isEmpty() && (next.isEmpty() || (gt(q, prev) && eq(q, next)))) {

state = "E";

return state;

}

if (q.isEmpty() || eq(q, prev)) {

state = "F";

return state;

}

return state;

}

private boolean eq(String a, String b) {

if (a.isEmpty() || b.isEmpty()) {

return false;

}

return a.equals(b);

}

private boolean gt(String a, String b) {

if (a.isEmpty() || b.isEmpty()) {

return false;

}

return Double.parseDouble(a) > Double.parseDouble(b);

}

private boolean lt(String a, String b) {

if (a.isEmpty() || b.isEmpty()) {

return false;

}

return Double.parseDouble(a) < Double.parseDouble(b);

}

public String getState() {

return this.state;

}

}

BagFactory bagFactory = BagFactory.getInstance();

@Override

public Tuple exec(Tuple input) throws IOException {

long c = 0;

String line = "";

String pbkey = "";

V0Line nextLine;

V0Line thisLine;

V0Line processLine;

V0Line evalLine = null;

V0Line prevLine;

boolean noMoreValues = false;

String matchList = "";

ArrayList<V0Line> lineFifo = new ArrayList<V0Line>();

boolean finished = false;

DataBag output = bagFactory.newDefaultBag();

if (input == null) {

return null;

}

if (input.size() == 0) {

return null;

}

Object o = input.get(0);

if (o == null) {

return null;

}

//Object o = input.get(0);

if (!(o instanceof DataBag)) {

int errCode = 2114;

String msg = "Expected input to be DataBag, but"

Simplified, sophisticated, standards based syntax

Pattern Matching With Oracle SQLSnapshot of Oracle SQL Analytic Functions

SELECT first_x, last_z

FROM ticker MATCH_RECOGNIZE (

PARTITION BY name ORDER BY time

MEASURES FIRST(x.time) AS first_x,

LAST(z.time) AS last_z

ONE ROW PER MATCH

PATTERN (X+ Y+ W+ Z+)

DEFINE X AS (price < PREV(price)),

Y AS (price > PREV(price)),

W AS (price < PREV(price)),

Z AS (price > PREV(price) AND

z.time - FIRST(x.time) <= 7 ))

250+ Lines of Java UDF 12 Lines of SQL

20x less code

Finding Patterns in Stock Market Data - Double Bottom (W)

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 19

10:00 10:05 10:10 10:15 10:20 10:25

Ticker

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data SQL – A New Architecture

• Powerful, high-performance SQL on Hadoop

– Full Oracle SQL capabilities on Hadoop

– SQL query processing local to Hadoop nodes

• Simple data integration of Hadoop and Oracle Database– Single SQL point-of-entry to access all data

– Scalable joins between Hadoop and RDBMS data

• Optimized hardware

– Balanced Configurations

– No bottlenecks

Oracle Confidential – Internal/Restricted/Highly Restricted 20

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Want to know what this reallymeans.

100%

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Stored in Hadoop

Oracle Confidential – Internal/Restricted/Highly Restricted 22

Hadoop/NoSQL Ecosystem {"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}}{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}

Example: Files with JSON data

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23

SQL-on-Hadoop Engines Share Metadata, not MapReduce

Hive Metastore

Hive Metastore

Hive ImpalaSparkSQLOracle Big Data SQL …

Table Definitions:movieapp_log_jsonTweetsavro_log

Metastore maps DDL to Java access classes

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enhanced Oracle External Tables

• New types of external tables

– ORACLE_HIVE (inherit metadata)

– ORACLE_HDFS (specify metadata)

• Access parameters for Big Data– Hadoop cluster

– Remote Hive database/table• DBMS_HADOOP Package for automatic import

24

CREATE TABLE movielog (

click VARCHAR2(4000))

ORGANIZATION EXTERNAL (

TYPE ORACLE_HIVE

DEFAULT DIRECTORY DEFAULT_DIR

ACCESS PARAMETERS

(

com.oracle.bigdata.tablename logs

com.oracle.bigdata.cluster mycluster

))

REJECT LIMIT UNLIMITED;

Schema on Read

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

CUSTOMERS

SELECT name, SUM(purchase)

FROM customers

GROUP BY name;

Intelligent Storage Maximizes Performance

What Can Big Data Learn from Exadata?

Oracle ExadataStorage Server

Oracle ExadataStorage Server

Oracle SQL query issued• Plan constructed• Query executed

1

Smart Scan Works on Storage• Filter out unneeded rows• Project only queried columns• Score data models• Bloom filters to speed up joins

2

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Storage Layer

Oracle Confidential – Internal/Restricted/Highly Restricted 26

Big Data SQL Server: A New Hadoop Processing Engine

Filesystem (HDFS)NoSQL Databases

(Oracle NoSQL DB, Hbase)

Resource Management (YARN, cgroups)

Processing Layer

MapReduceand Hive

Spark Impala SearchBig Data

SQL

B

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

B B B

How do we query Hadoop?

Big Data SQL Query Execution

HDFSData NodeBDS Server

HDFS Data NodeBDS Server

Query compilation determines:• Data locations • Data structure• Parallelism

1

Fast reads using Big Data SQL Server• Schema-on-read using Hadoop classes• Smart Scan selects only relevant data

2

Process filtered result• Move relevant data to database• Join with database tables• Apply database security policies

3Hive Metastore

HDFSNameNode 1

2 3

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

But How Does Security Work?

B B B

Database security for query access• Virtual Private Databases• Redaction• Audit Vault and Database Firewall

1

Hadoop security for Hadoop jobs• Kerberos Authentication• Apache Sentry (RBAC)• Audit Vault

2

System-specific encryption• Database tablespace encryption• BDA On-disk Encryption

3

SELECT * FROM my_bigdata_table

WHERE SALES_REP_ID =

SYS_CONTEXT('USERENV','SESSION_USER');

Filter on SESSION_USER

DBMS_REDACT.ADD_POLICY(

object_schema => 'MCLICK',

object_name => 'TWEET_V',

column_name => 'USERNAME',

policy_name => 'tweet_redaction',

function_type => DBMS_REDACT.PARTIAL,

function_parameters =>

'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25',

expression => '1=1'

);

***

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 29

Summary: Oracle Big Data SQL

Oracle SQL , on all your data.

Oracle SQL on Hadoop and beyond• With a Smart Scan service inspired by Exadata• With native SQL operators• With the security of Oracle Database

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Key Technologies Driving Innovation

Use the Right Tool for the Job and benefit from the Power of “AND”

30

Run the Business

Business transactions

Business analytics

RelationalHadoop

Change the Business

Data reservoirs

Exploit new analyses

NoSQL

Scale the Business

Fast simple data structures

Scale-out economically

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 31

Discover and predict, fast

Simplify access to all data

Secure andgovern all data

MAKING BIG DATA BUSINESS AS USUAL with Oracle Big Data Enabling your Organization

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• VM containing key components of Oracle Big Data Platform

• Download from OTN

• In sync with latest BDA release

• Used for:

– Learning about the Oracle platform

– Developing applications deployed to BDA

– BDA Client

Get Started: Oracle Big Data Lite Virtual Machine

http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |