graph databases and graph analytics - itoug.it · pdf file•deep learning + graph analysis...

49

Upload: letuyen

Post on 09-Mar-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search
Page 2: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Graph Databases and Graph Analytics – Just a Hype or the End of the Relational World? ITOUG Tech Day 2017

Hans Viehmann Product Manager EMEA Milano, June 8th , 2017 @SpatialHannes

Page 3: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3

Page 4: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 4

https://twitter.jeffprod.com

Following, no follow back Follower, no follow back Follow each other

Page 5: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• What is a graph?

– Data model representing entities as vertices and relationships as edges

– Optionally including attributes

– Also known as „linked data“

• What are typical graphs?

– Social Networks • LinkedIn, Facebook, Google+, Twitter, ...

– Physical networks, Supplier networks,...

– Knowledge Graphs • Apple SIRI, Google Knowledge Graph, ...

Graph Data Model

E

A D

C B

F

5

Page 6: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Why are graphs popular?

– Easy data modeling • „whiteboard friendly“

– Flexible data model • No predefined schema, easily extensible

• Particularly useful for sparse data

– Insight from graphical representation • Intuitive visualization

– Enabling new kinds of analysis • Overcoming some limitations in relational

technology

• Basis for Machine Learning (Neural Networks)

Graph Data Model

E

A D

C B

F

6

Page 7: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Background: Three Types of Graph Data Models

RDF Data Model

• Data federation

• Knowledge representation

• Semantic Web

Social Network Analysis

General Purpose Analysis

Linked Data / Metadata Layer

Property Graph Model

• Graph Data Management

• Social Network Analysis

• Entity analytics

Purpose-built for Linked Data and Semantic Web, conforming to W3C RDF standards

Spatial Network Analysis

Purpose-built for Spatial Network Analysis

Network Data Model

• Network path analysis

• Transportation modeling

7

Page 8: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Computational Graph Analytics

• Compute values on vertices and edges

• Traversing graph or iterating over graph (usually repeatedly)

• Procedural logic

• Examples:

– Shortest Path, PageRank, Weakly Connected Components, Centrality, ...

Graph Pattern Matching

• Based on description of pattern

• Find all matching sub-graphs

Categories of Graph Analysis

:Person{100} name = ‘Amber’ age = 25

:Person{200} name = ‘Paul’ age = 30

:Person{300} name = ‘Heather’ age = 27

:Company{777} name = ‘Oracle’ location = ‘Redwood City’

:worksAt{1831} startDate = ’09/01/2015’

:friendOf{1173}

:knows{2200}

:friendOf {2513} since = ’08/01/2014’

8

Page 9: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Examples for Graph Analytics

• Community detection and influencer analysis

– Churn risk analysis/targeted marketing, HR Turnover analysis

• Product recommendation

– Collaborative filtering, clustering

• Anomaly detection

– Social Network Analysis (spam detection), fraud detection in healthcare

• Path analysis and reachability

– Outage analysis in utilities networks, vulnerability analysis in IP networks, „Panama Papers“

• Pattern matching

– Tax fraud detection, data extraction

9

Page 10: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Requirement:

– Identify entities from a graph dataset that are relatively more important than others (from topology)

• Approaches:

– Determine centrality of entities (concept based on graph theory)

Graph Analysis: Influencer Identification

10

Page 11: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Centrality is measure of relative importance of vertices in a graph

• Many variations of centrality in graph theory

– Betweenness Centrality

– Closeness Centrality

– Eigenvector Centrality

– Pagerank

– HITS (Hyperlink-Induced Topic Search)

– …

Graph Analysis: Influencer Identification

Betweenness centrality Eigenvector centrality

(images from Wikipedia)

Each algorithm suggests different definition of importance

11

Page 12: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Measuring importance using Page Rank

• Original algorithm developed by Larry Page for ranking in Google

• Making a node connected to by important nodes also important

• Can be measure of trust or prominence

Confidential – Oracle Internal/Restricted/Highly Restricted 12

Graph Analysis: Influencer Identification

Friend VIP

Customer

Son

Page 13: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Use case #1: Targeted Marketing in Telco

• Model each subscriber as a vertex in the graph

• Interactions between subscribers are represented by edges

– Taking into account both on-net and off-net

• Based on call data records for voice, SMS, MMS

– Usually combining all interactions in a property representing the strength of the edge

• Using centrality algorithms to determine important customers

• Target these customers with marketing campaigns for retention – Reducing churn risk for all additional customers he/she is connected with

13

Page 14: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Requirement:

– Need customer-item interactions such as purchases or rating records

• Approaches:

– Create graph of customers and items

– Run Personalized Pagerank using target customer as starting point

– Optionally cluster customers for further analysis

– (can also be used to find anomalies)

Use case #2: Product Recommendation in Retail

14

Page 15: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Requirement:

– Identify entities from a large dataset that look different than others, especially in their relationships

• Approaches:

– Define an anomaly pattern, find all instances of the pattern in the graph

– Given nodes in the same category, find nodes that stand out (eg. low Pagerank value)

Graph Analysis: Anomaly Detection

16

Page 16: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Example for potential fraud detection

– Public domain dataset

– Medical providers and their operations

• Question

– Are there any medical providers that are suspicious

medical providers that perform different operations than their fellows

(e.g. eye doctors doing plastic surgery)

• Approach – Create graph between doctors and

operations

– Apply personalized pagerank (a.k.a equivalent to random walking)

– Identify doctors that are far from their fellows

Use case #3: Fraud Detection in Healthcare

Clinics (doctors)

Operations

17

Page 17: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Requirement:

– Identify all entities from a graph dataset that are connected with a given entity

– Determine how entities are connected to each other (ie. via which paths)

• Approaches:

– Traverse the graph starting from the specified vertex

Graph Analysis: Path and Reachability

18

Page 18: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Real-world use case from a utilities company

• Analyzing power distribution network

– Vertices: Generators, Transformers, Switches, ...

– Edges: transmission lines

• Question

– Which households have power when some given switches are turned off

Use case #4: Network Outage Analysis

19

Page 19: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Represent the data as a graph – Fits very naturally – Note that vertices and edges have extra

information or properties

• Answer the question in natural ways – Starting from the given vertex, – traverse the graph and mark reachable vertices

– but without going through ‘off’ switches

Use case #4: Network Outage Analysis ID: 2018281 Type: Generator Name: XFM_Sub

ID: 27080172 Type: Switch Status: Off Name: SW_38

Graph representation allows:

• Intuitive description of graph traversal

• Fast edge traversal without computing joins

20

Page 20: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• Network Intrusion Detection

• Deep Learning + Graph Analysis

Property Graph

• Blue edges: malicious

• Other edges: normal traffic

• Many attacks originated from

175.45.176.1 to target

149.171.126.17

• Visualization tool: Cytoscape v3.2.1

+ Big Data Spatial and Graph v2.1

Train Neural Network model

Data Cleansing & preparation

Generate Property Graph

Load Property Graph into BDSG

Graph Visualization

Dataset selection

Page 21: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Spatial and Graph

• Available for Big Data platform

– Hadoop, HBase, Oracle NoSQL

• Supported both on BDA and commodity hardware

– CDH and Hortonworks

• Database connectivity through Big Data Connectors or Big Data SQL

• Part of Big Data Cloud Service

Oracle Spatial and Graph (DB option)

• Available with Oracle 12.2 (EE)

• Using tables for graph persistence

• In-database graph analytics

– Sparsification, shortest path, page rank, triangle counting, WCC, sub graph generation…

• SQL queries possible

– Integration with Spatial, Text, Label Security, RDF Views, etc.

22

In-memory Analytics Engine – Product Packaging

Page 22: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Oracle Graph Analytics Architecture

Scalable and Persistent Storage

Graph Data Access Layer API

Graph Analytics In-memory Analytic Engine

REST W

eb Service

Blueprints & SolrCloud / Lucene

Property Graph Support on Apache HBase, Oracle NoSQL or Oracle 12.2

Pyth

on

, Perl, PH

P, Ru

by,

Javascript, …

Java APIs

Java APIs

23

Page 23: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

The Property Graph Data Model

• A set of vertices (or nodes) – each vertex has a unique identifier.

– each vertex has a set of in/out edges.

– each vertex has a collection of key-value properties.

• A set of edges (or links) – each edge has a unique identifier.

– each edge has a head/tail vertex.

– each edge has a label denoting type of relationship between two vertices.

– each edge has a collection of key-value properties.

https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Page 24: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Creating a Graph

• From a relational model

– Rows in tables usually become vertices

– Columns become properties on vertices

– Relationships become edges

– Join tables in n:m relations are transformed into relationships, columns become properties on edges

• Through API or interactively using a graphical tool

– Adding vertices, edges, properties to a given graph

• From graph exchange formats – GraphML, GraphSON, GML (Graph Modeling Language)

25

Page 25: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Interacting with the Graph

• Access through APIs

– Implementation of Apache Tinkerpop Blueprints APIs

– Based on Java, REST plus SolR Cloud/Lucene support for text search

• Scripting – Groovy, Python, Javascript, ...

– Apache Zeppelin integration, Javascript (Node.js) language binding

• Graphical UIs

– Cytoscape, plug-in available for BDSG

– Commercial Tools such as TomSawyer Perspectives

No SQL and no SQL*Plus

26

Page 26: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Graph Analysis Algorithms can be very hard to code ...

• Example: Find the size of the 2-hop network of vertices (Gremlin+Python)

• Single API call instead

– Analysis in memory, in parallel

• Results can be persisted in Graph store and accessed from Oracle Database – Big Data SQL, Connectors

Oracle Big Data Spatial and Graph comes with 40+ pre-built algorithms

sum([v.query() \

.direction(blueprints.Direction.OUT).count() \

for v in OPGIterator(v0.query() \

.direction(blueprints.Direction.OUT) \

.vertices().iterator())])

Page 27: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Example: Betweenness Centrality in Big Data Graph

Code

31

D

A

C

E

B

F

I

J

H

K

G

analyst.vertexBetweennessCentrality(pg)

.getTopKValues(15)

Page 28: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32

Using Notebooks

Page 29: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33

Using Notebooks

Page 30: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Structure Evaluation – Conductance

– countTriangles

– inDegreeDistribution

– outDegreeDistribution

– partitionConductance

– partitionModularity

– sparsify

– K-Core computes

Community Detection – communitiesLabelPropagation

Ranking – closenessCentralityUnitLength

– degreeCentrality

– eigenvectorCentrality

– Hyperlink-Induced Topic Search (HITS)

– inDegreeCentrality

– nodeBetweennessCentrality

– outDegreeCentrality

– Pagerank, weighted Pagerank

– approximatePagerank

– personalizedPagerank

– randomWalkWithRestart

34

Social Network Analysis Algorithms (1)

Page 31: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Pathfinding – fattestPath

– shortestPathBellmanFord

– shortestPathBellmanFordReverse

– shortestPathDijkstra

– shortestPathDijkstraBidirectional

– shortestPathFilteredDijkstra

– shortestPathFilteredDijkstraBidirectional

– shortestPathHopDist

– shortestPathHopDistReverse

Recommendation – salsa

– personalizedSalsa

– whomToFollow

Classic - Connected Components – sccKosaraju

– sccTarjan

– wcc

35

Social Network Analysis Algorithms (2)

Page 32: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

• SQL-like syntax but with graph pattern description and property access

– Interactive (real-time) analysis

– Supporting aggregates, comparison, such as max, min, order by, group by

• Finding a given pattern in graph

– Fraud detection

– Anomaly detection

– Subgraph extraction

– ...

• Proposed for standardization by Oracle

– Specification available on-line

– Open-sourced front-end (i.e. parser)

Pattern matching using PGQL

https://github.com/oracle/pgql-lang

36

Page 33: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

PGQL Example query

• Find all instances of a given pattern/template in data graph

• Fast, scaleable query mechanism

SELECT v3.name, v3.age FROM ‘myGraph’ WHERE (v1:Person WITH name = ‘Amber’) –[:friendOf]-> (v2:Person) –[:knows]-> (v3:Person)

query

Query: Find all people who are known to friends of ‘Amber’.

data graph ‘myGraph’

:Person{100} name = ‘Amber’ age = 25

:Person{200} name = ‘Paul’ age = 30

:Person{300} name = ‘Heather’ age = 27

:Company{777} name = ‘Oracle’ location = ‘Redwood City’

:worksAt{1831} startDate = ’09/01/2015’

:friendOf{1173}

:knows{2200}

:friendOf {2513} since = ’08/01/2014’

37

Page 34: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

OAAgraph integration with R

• OAAgraph integrates in-memory engine into ORE and ORAAH

• Adds powerful graph analytics and querying capabilities to existing analytical portfolio of ORE and ORAAH

• Built in algorithms of PGX available as R functions

• PGQL pattern matching

• Concept of “cursor” allows browsing of in-memory analytical results using R data structures (R data frame), allows further client-side processing in R

• Exporting data back to Database / Spark allows persistence of results and further processing using existing ORE and ORAAH analytical functions

Page 35: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Use case

• Load persons data into ORE

• Check the data set

• Cluster persons by their age with K-means

• Load calls data into ORE

• Create an OAAgraph object with persons and calls

• Compute Pagerank and check results

• Export results back to ORE

• Cluster persons by their age AND pagerank values (with K-means)

Page 36: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

SPARK

SPARK

SPARK

PGX

PGX

Graph Analytics on SPARK • Use SPARK for conventional tabular

data processing (RDD, Dataframe, -set)

• Define graph view of the data – View it as node table and edge table

• Load into PGX

• Execute graph algorithms in PGX – Orders of magnitude faster than GraphX

– More scaleable

• Push analysis results back into SPARK as additional tables

• Continue SPARK analysis

SPARK data structure and communication mechanism not optimized for graph analysis workloads

Page 37: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Text Search through Apache Lucene/Solr

• Use text indexing to access vertices or edges

– Eg. find person with given name as starting point for reachability analysis

– oraclePropertyGraph.createKeyIndex(“name”, Vertex.class);

– oraclePropertyGraph.getVertices(“name”, “*Obama*”, true);

• Based on Apache Solr/Solr Cloud

– Highly scaleable through sharding and replication

• Uses Apache Lucene under the covers

– open source text search engine library

– inverted index, ranked searching, fuzzy matching …

• Supports manual and auto indexing of Graph elements

41

Page 38: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

In-memory Analytics Engine Deployment options

PGX

HBASE / NoSQL

Client initiates PGX as a YARN task

Interactive (private server) Execution

pgx> :loadGraph mygraph.json …

BDA Cluster

Client

Client controls PGX via an interactive shell

...

pgx> :pagerank mygraph 0.85 …

To load the Graph and run the analysis

Shared Server

PGX Server

RDF / HBASE / NoSQL

PGX can be configured as a service, with certain graphs pre-loaded And shared by multiple

clients

Batch Mode

:loadGraph …

:pagerank …

Client can submit a PGX script as a batch job

PGX

Dry Run (Local Execution)

… Client can run PGX locally with small data set Data File

42

Page 39: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

A Word on Performance Sub-millisecond Performance for Graph Operations in NoSQL

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Time (ms)

Oracle Big Data Spatial and Graph: Property Graph – Data Access Oracle NoSQL Database: Graph Operations On Twitter Data

(50K vertices, 50K edges, 10 K/V pairs for each)

43

Page 40: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Graph Analysis: Performance Compared with Neo4J

0.01

0.1

1

10

100

1000

10000

100000

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8

Tim

e (

ms)

Linux Kernel analysis on X86

PGX Neo4j

210x 276x 285x 276x 225x 909x

11987x

525x

Basic graph pattern Path queries Path queries Path queries Path queries Single

shortest path Bulk shortest

path

X86 Server

Xeon E5-2660 2.2Ghz

2 socket

x 8 cores

x 2HT

256GB DRAM

Neo4J: 2.2.1

Data:

- Linux kernel code as a

graph

- Program analysis queries

Path queries of Linux kernel source code

Huge performance advantage over Neo4J graph DB

(2~4 orders of magnitude)

44

Page 41: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Distributed Graph Analysis Engine

• Oracle Big Data Spatial and Graph uses very compact graph representation

– Can fit graph with ~23bn edges into one BDA node

• Distributed implementation scales beyond this

– Processing even larger graphs with several machines in a cluster (scale-out)

– Interconnected through fast network (Ethernet or, ideally, Infiniband)

• Integrated with YARN for resource management

– Same client interface, but not all APIs implemented yet

• Again, much faster than other implementations

– Comprehensive performance comparison with GraphX, GraphLab

Handling extremely large graphs

45

Page 42: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Graph visualization – Cytoscape, Vis.js, ...

46

Page 43: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Graph Visualization – Commercial Tools TomSawyer Perspectives 7.5 has Property Graph pre-integrated

47

Page 44: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Linkurious Ogma

• Server-based (Node.JS)

• Light-weight JavaScript visualizer

• Powerful rendering

• Oracle integration

48

https://www.slideshare.net/Linkurious/how-to-visualize-oracle-big-data-spatial-and-graph-with-ogma

https://linkurio.us/visualize-oracle-graph-data-ogma-library/

https://linkurio.us

Page 45: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Summary Graph capabilities in Oracle Big Data Spatial and Graph

• Graph databases are powerful tools, complementing relational databases

– Especially strong for analysis of graph topology and multi-hop relationships

• Graph analytics offer new insight

– Especially relationships, dependencies and behavioural patterns

• Oracle Big Data Spatial and Graph offers – Comprehensive analytics through various APIs, integration with relational database

– Scaleable, parallel in-memory processing

– Secure and scaleable graph storage on Hadoop using Oracle NoSQL or HBase

• Runs on commodity hardware or BDA, both on-premise or in the Cloud

49

Page 46: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Resources

• Oracle Big Data Spatial and Graph OTN product page: www.oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph

– White papers, software downloads, documentation and videos

• Oracle Big Data Lite Virtual Machine - a free sandbox to get started: www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

• Hands On Lab included in /opt/oracle/oracle-spatial-graph/

– Content also available on GITHub under http://github.com/oracle/BigDataLite/

• Blog – examples, tips & tricks: blogs.oracle.com/bigdataspatialgraph

• @OracleBigData, @SpatialHannes, @agodfrin, @JeanIhm

• Oracle Spatial and Graph Group

50

Page 47: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Q&A

51

Page 48: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 52

Page 49: Graph Databases and Graph Analytics - itoug.it · PDF file•Deep Learning + Graph Analysis ... Blueprints & SolrCloud / Lucene ... REST plus SolR Cloud/Lucene support for text search