oracle big data spatial and graph

41
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. Big Data Spatial and Graph An Overview ilOUG Tech Days - June, 2015 Michel Benoliel Master Principal Consultant Oracle Israel

Upload: dyahalom

Post on 09-Apr-2017

594 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Oracle big data spatial and graph

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.

Big Data Spatial and Graph An Overview

ilOUG Tech Days - June, 2015 Michel Benoliel Master Principal Consultant Oracle Israel

Page 2: Oracle big data spatial and graph

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2

Page 3: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Agenda

• Spatial and Graph Strategy

• Introduction to Big Data Spatial and Graph

• Spatial Features

•Graph Features

3

Page 4: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Oracle’s Spatial and Graph Strategy Enable Spatial and Graph use cases on every Big Data platform

NoSQL

Oracle Big Data Spatial and Graph Oracle Database

Spatial and Graph

4

Page 5: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Conventional database or Big Data technologies Typical technical decision criteria

0

1

2

3

4

5

Tooling maturity

Stringent Non-Functionals

ACID transactional requirement

Security

Variety of data formats

Data sparsity

ETL simplicity

Cost effectively store low value data

Ingestion rate

Straight Through Processing (STP)

Hadoop

Relational

Hadoop and/or NoSQL Relational Database

5

Page 6: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

The Big Picture – Oracle Big Data Management System SO

UR

CES

DATA RESERVOIR DATA WAREHOUSE

Oracle Database

Oracle Industry Models

Oracle Advanced Analytics

Oracle Spatial & Graph

Big Data Appliance

Apache Flume

Oracle GoldenGate

Oracle Event Processing

Cloudera Hadoop

Oracle Big Data SQL

Oracle NoSQL

Oracle R Distribution

Oracle Big Data Spatial and Graph

Oracle Database

In-Memory, Multi-tenant

Oracle Industry Models

Oracle Advanced Analytics

Oracle Spatial and Graph

Exadata

Oracle GoldenGate

Oracle Event Processing

Oracle Data Integrator

Oracle Big Data Connectors

Oracle Data Integrator B

6

Page 7: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Oracle Big Data Spatial and Graph

Property Graph for Analysis of:

• Social Media relationships

• Internet of Things interactions

• Cyber-Security

Spatial Analysis Features for:

• Location Data Enrichment

• Proximity and containment analysis

• Preparation of digital map and imagery data sets

7

Page 8: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Introduction: Spatial for Big Data

8

Page 9: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

What is Spatial Data Integral part of almost every database

• Business data that contains or describes location

– Geographic features (roads, rivers, parks, etc.)

– Assets (pipe lines, cables, transformers,

– Sales data (sales territory, customer registration, etc.)

– Street and postal address (customers, stores, factories, etc.)

• Anything associated with a physical location

• Described by coordinates or implicitly as text (place name), ...

• Location is a “universal key” relating otherwise unrelated entities

Page 10: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Linking information by location Are these data points related?

• Tweet: sailing by #goldengate

• Instagram image subtitle: 골든게이트 교*

• Text message: Driving on 101 North , just reached border between Marin County and San Francisco County

• GPS Sensor: N 37°49′11″ W 122°28′44″

• Now find all data points around Golden Gate Bridge ...

* Golden Gate Bridge (in Korean)

Page 11: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted

Oracle Big Data – Spatial Features

• Geo-enrichment for Data Harmonization

– Resolution of location-related information

– Determination of location hierarchies

• Categorization and filtering – Tracking, proximity analysis, geo-fencing and categorization based on location

• Data preparation

– Large scale geoprocessing for cleansing, preparation of imagery, sensor data, and raw data input

• Data visualization

11

Page 12: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Use Cases: Geocoding, Geo-Enrichment

12

(i) Geocode Call Data Records, sensor data, other sources to aggregate and display wireless network performance (dropped calls, utilization)

(ii) Transportation origin-destination analysis. Combine transit card/payment info and other sources to determine where (and how many) people travel to, starting from any station on a transit network

(iii) Geotagged Twitter: where are the tourists and locals tweeting

i ii

iii

Page 13: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Use case: Categorization, filtering, aggregation

13

Page 14: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Use case: Data preparation

14

Mosaic images

Terrains and contours Shaded reliefs

Pyramiding: layers at different resolution

Page 15: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Spatial Features – Technical overview

• Support for spatial data in 2D or 3D in various formats, geodetic or projected

• Support for geo-referenced imagery such as satellite images in many formats

• MapReduce framework for resolution of placenames and determination location hierarchies, including reference dataset

• Spatial indexing techniques for fast retrieval of spatial data

• Library of spatial operators for geometric analysis (inside, within distance, nearest neighbor, ...)

• Library of image processing functions (mosaic, reprojection, format conversion, analysis, ...)

• Console for visual analysis, indexing, processing

– Sample JEE application to be deployed in Jetty

Page 16: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Console

Create Index on spatial data in HDFS

Page 17: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Console

Run Map Reduce job to perform categorization based on spatial hierarchy

Page 18: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Console

Results in Console

“Tweets in May by State”

Page 19: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal/Restricted/Highly Restricted 19

Page 20: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Graph Data Model

What is a graph? – A set of edges (links) and vertexes (nodes) (and optionally properties)

– A graph is simply linked data

Why do we care?

– Graphs are everywhere • Social networks/Social Web (Facebook, Linkedin, Twitter, Baidu, Google+,…)

• Cyber networks, power grids, protein interaction graphs

• Knowledge graphs (IBM Watson, Apple SIRI, Google Knowledge Graph)

– Graphs are intuitive and flexible • Easy to navigate, easy to form a path, natural to visualize

• Do not require a predefined schema

E

A D

C B

F

Page 21: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Why Graph Databases Now?

Rise of social networking Google, Yahoo, Twitter, Facebook, Linked In

Enterprise applications increasingly need to model data relationships

Telecoms: Network & Data center management, identity management Financial Services: Fraud detection; cross-selling Media & Publishing: Social apps, recommendation, sentiment Health Care: CRM, fraud detection

Modeling complex relationships as graphs is efficient Improves performance Simplifies queries, traversal, search and analytics

21

Page 22: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Graphs Are Big . . . and Getting Bigger

• Social Scale* – 1 billion vertices, 100 billion edges

• Web Scale* • 50 billion vertices, 1 trillion edges

• Brain Scale* • 100 billion vertices, 100 trillion edges

* An NSA Big Graph Experiment

http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Page 23: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Graph for Social and Unstructured Data Analysis

Graph is a powerful tool for Data Analysis

you capture fine-grained, arbitrary

By representing your data as a graph with

relationships between data entities

Individual relationships are represented as links

When analyzing such a graph,

you are using explicit relationships

to find implicit information

about your data

Without computing multiple joins

24

Page 24: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Oracle Supports Different Kinds of Graphs

RDF Data Model

• Data federation

• Knowledge representation

• Graph pattern analysis

Social Network Analysis

National Intelligence Public Safety Social Media search Marketing - Sentiment

Linked Data / Enterprise Metadata

Property Graph Model

• Graph Search & Analysis

• Big Data analytics

• Entity analytics

Life Sciences Health Care Publishing Finance

Spatial Network Analysis

Logistics Transportation Utilities Telcoms

Network Data Model

• Network path analysis

• Multi-model modeling

Use Case Graph Model Industry Domain

Page 25: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

The Property Graph Data Model

• A set of vertices (or nodes) – each vertex has a unique identifier.

– each vertex has a set of in/out edges.

– each vertex has a collection of key-value properties.

• A set of edges (or links) – each edge has a unique identifier.

– each edge has a head/tail vertex.

– each edge has a label denoting type of relationship between two vertices.

– each edge has a collection of key-value properties.

https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Page 26: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Common Graph Analysis Use Cases

Purchase Record

customer items

Product Recommendation Influencer Identification

Communication Stream (e.g. tweets)

Graph Pattern Matching Community Detection

Recommend the most similar item purchased by similar people

Find out people that are central in the given network – e.g. influencer marketing

Identify group of people that are close to each other – e.g. target group marketing

Find out all the sets of entities that match to the given pattern – e.g. fraud detection

27

Page 27: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Graph Analysis Examples:

• Attribute searching (Get people with a given name)

• Vertex/edge adjacency (Get people that like a given Web page)

• Fixed-length paths (Get the friends of the friends of a given person)

• Reach-ability (Is there a “friend” connection between two people?)

• Pattern matching (Get the common friends between two people)

• Aggregates (Get the number of friends of a given person)

28

Page 28: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Use cases

Social Media Management & Analysis Identifying common friends and interests; identify primary influencers in social network; graph data management of large social media services

Online Product Recommendations Recommender systems; sentiment analysis; customer churn analysis; customer behavior analytics; customer trend prediction

Internet of Things Manage data properties and relationships for complex webs of inter-operating devices and systems; predictive modeling of system behavior

Cyber-Security Fraud detection; identity management; reveal clusters of similar behaviors and properties; discover relationships based on pattern matching

29

Page 29: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Product Details: Graph Features

30

Page 30: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Graph Architecture Oracle Big Data Spatial and Graph

Scalable and Persistent Storage

Graph Data Access Layer API

Graph Analytics In-memory Analytic Engine

REST W

eb Service

Blueprints & SolrCloud / Lucene

Property Graph Support on Apache HBase and Oracle NoSQL

Pyth

on

, Perl, PH

P, Ru

by,

Javascript, …

Java APIs

Java APIs

31

Page 31: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Big Data Spatial and Graph Property Graph Features

• Highly scalable graph database and analytics engine

• Implemented on Apache HBase and Oracle NoSQL Database

• Rich developer APIs

– Blueprints, REST, Java graph plus support for Groovy, Python, PHP, Perl, Ruby, and JavaScript

• Fast, scalable suite of social network analysis functions

– Ranking, centrality, recommender, community detection, path finding…

– Targeted to address main industry requirements

• Manageability

– Bulk load

– Console to execute Java and Gremlin APIs

32

Page 32: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Support for Open Source TinkerPop Graph Tool Stack Oracle Big Data Spatial and Graph Blueprints API implementation provides support for the de-facto graph database standard TinkerPop component stack.

These include query language, dataflow, REST APIs, and others.

33

Page 33: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Property Graph Data Access Layer

• Tinkerpop Tools: Blueprints, Gremlin, Rexter supported

• Graph schema optimized on Apache HBase

• Graph schema optimized on Oracle NoSQL Database

• GraphML, GML, GraphSON, and Oracle-defined flat files (.ope & .opv)

• Bulk load of property graph data

34

Page 34: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Property Graph Data Access Layer

• Parallel scan of property graph data

• Apache Lucene Text search of graph data

• Groovy shell for accessing property graph data

• iPython-based interface example

• SolrCloud integration

35

Page 35: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Developer APIs for Analytics

• Blueprint APIs

– Modify/insert/delete graph edges, vertices, key-values

– Blueprints stack (Gremlin, Pipes, etc.) provide additional functionality

• Java: High level graph analysis APIs expose core functions of graph engine

– Customers perform graph analysis using these Java APIs

– Graph and subgraph identification (using key-value constraints)

– R access through Java APIs

• REST APIs

– Graph analysis

– Graph and sub-graph identification (using key-value constraints)

• Web scripting languages supported through REST APIs above

– PHP, Python, Ruby, Groovy, etc.

Page 36: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Property Graph in-Memory Graph Analytics

• Parallel data reading from data access layer into memory

• Choice of deployment

– Standalone application server

– On Hadoop node

• Graph formats (in addition to those supported by the data access layer)

– EBin, Adjacency list, Edge List

• J2EE container support (WLS, Tomcat, Jetty)

Page 37: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

35 Graph Functions

Detecting Components and Communities

Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang’s

Ranking and Walking

Pagerank, Personalized Pagerank, Betweenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants)

Evaluating Community Structures

∑ ∑

Conductance, Modularity Clustering Coefficient (Triangle Counting) Adamic-Adar

Path-Finding

Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s

Link Prediction SALSA (Twitter’s Who-to-follow)

Other Classics Vertex Cover Minimum Spanning-Tree(Prim’s)

Page 38: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 39

Page 39: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

In-Memory Graph Analysis Framework

• Large graph analysis is time-consuming because …

– The computation typically involves touching most nodes and edges in the graph

– The data-access pattern is random

• In-memory, parallel framework for fast graph analytics

• Exploits the architecture of modern servers – The computation is parallelized using multiple CPU cores

– The non-sequential data-access is mitigated with large DRAMs

40

Page 40: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.

Oracle Property Graph Engine

• Reads graph from Apache HBase or Oracle NoSQL

• Data Access Layer filtering used to create subgraph for Property Graph Engine analytics

Oracle Property Graph or RDF (HBase or NoSQL)

Property Graph Engine

Analytic Request

Analytic Request

Analytic Request

Analytic Request

Analytic Request

Analytic Request

Trans-actional Request

41

Page 41: Oracle big data spatial and graph

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 42