oracle big data spatial and graph
TRANSCRIPT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Big Data Spatial and Graph An Overview
ilOUG Tech Days - June, 2015 Michel Benoliel Master Principal Consultant Oracle Israel
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Agenda
• Spatial and Graph Strategy
• Introduction to Big Data Spatial and Graph
• Spatial Features
•Graph Features
3
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle’s Spatial and Graph Strategy Enable Spatial and Graph use cases on every Big Data platform
NoSQL
Oracle Big Data Spatial and Graph Oracle Database
Spatial and Graph
4
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Conventional database or Big Data technologies Typical technical decision criteria
0
1
2
3
4
5
Tooling maturity
Stringent Non-Functionals
ACID transactional requirement
Security
Variety of data formats
Data sparsity
ETL simplicity
Cost effectively store low value data
Ingestion rate
Straight Through Processing (STP)
Hadoop
Relational
Hadoop and/or NoSQL Relational Database
5
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
The Big Picture – Oracle Big Data Management System SO
UR
CES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry Models
Oracle Advanced Analytics
Oracle Spatial & Graph
Big Data Appliance
Apache Flume
Oracle GoldenGate
Oracle Event Processing
Cloudera Hadoop
Oracle Big Data SQL
Oracle NoSQL
Oracle R Distribution
Oracle Big Data Spatial and Graph
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced Analytics
Oracle Spatial and Graph
Exadata
Oracle GoldenGate
Oracle Event Processing
Oracle Data Integrator
Oracle Big Data Connectors
Oracle Data Integrator B
6
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Spatial and Graph
Property Graph for Analysis of:
• Social Media relationships
• Internet of Things interactions
• Cyber-Security
Spatial Analysis Features for:
• Location Data Enrichment
• Proximity and containment analysis
• Preparation of digital map and imagery data sets
7
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Introduction: Spatial for Big Data
8
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
What is Spatial Data Integral part of almost every database
• Business data that contains or describes location
– Geographic features (roads, rivers, parks, etc.)
– Assets (pipe lines, cables, transformers,
– Sales data (sales territory, customer registration, etc.)
– Street and postal address (customers, stores, factories, etc.)
• Anything associated with a physical location
• Described by coordinates or implicitly as text (place name), ...
• Location is a “universal key” relating otherwise unrelated entities
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Linking information by location Are these data points related?
• Tweet: sailing by #goldengate
• Instagram image subtitle: 골든게이트 교*
• Text message: Driving on 101 North , just reached border between Marin County and San Francisco County
• GPS Sensor: N 37°49′11″ W 122°28′44″
• Now find all data points around Golden Gate Bridge ...
* Golden Gate Bridge (in Korean)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted
Oracle Big Data – Spatial Features
• Geo-enrichment for Data Harmonization
– Resolution of location-related information
– Determination of location hierarchies
• Categorization and filtering – Tracking, proximity analysis, geo-fencing and categorization based on location
• Data preparation
– Large scale geoprocessing for cleansing, preparation of imagery, sensor data, and raw data input
• Data visualization
11
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use Cases: Geocoding, Geo-Enrichment
12
(i) Geocode Call Data Records, sensor data, other sources to aggregate and display wireless network performance (dropped calls, utilization)
(ii) Transportation origin-destination analysis. Combine transit card/payment info and other sources to determine where (and how many) people travel to, starting from any station on a transit network
(iii) Geotagged Twitter: where are the tourists and locals tweeting
i ii
iii
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use case: Categorization, filtering, aggregation
13
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use case: Data preparation
14
Mosaic images
Terrains and contours Shaded reliefs
Pyramiding: layers at different resolution
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Spatial Features – Technical overview
• Support for spatial data in 2D or 3D in various formats, geodetic or projected
• Support for geo-referenced imagery such as satellite images in many formats
• MapReduce framework for resolution of placenames and determination location hierarchies, including reference dataset
• Spatial indexing techniques for fast retrieval of spatial data
• Library of spatial operators for geometric analysis (inside, within distance, nearest neighbor, ...)
• Library of image processing functions (mosaic, reprojection, format conversion, analysis, ...)
• Console for visual analysis, indexing, processing
– Sample JEE application to be deployed in Jetty
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Console
Create Index on spatial data in HDFS
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Console
Run Map Reduce job to perform categorization based on spatial hierarchy
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Console
Results in Console
“Tweets in May by State”
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal/Restricted/Highly Restricted 19
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Data Model
What is a graph? – A set of edges (links) and vertexes (nodes) (and optionally properties)
– A graph is simply linked data
Why do we care?
– Graphs are everywhere • Social networks/Social Web (Facebook, Linkedin, Twitter, Baidu, Google+,…)
• Cyber networks, power grids, protein interaction graphs
• Knowledge graphs (IBM Watson, Apple SIRI, Google Knowledge Graph)
– Graphs are intuitive and flexible • Easy to navigate, easy to form a path, natural to visualize
• Do not require a predefined schema
E
A D
C B
F
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Why Graph Databases Now?
Rise of social networking Google, Yahoo, Twitter, Facebook, Linked In
Enterprise applications increasingly need to model data relationships
Telecoms: Network & Data center management, identity management Financial Services: Fraud detection; cross-selling Media & Publishing: Social apps, recommendation, sentiment Health Care: CRM, fraud detection
Modeling complex relationships as graphs is efficient Improves performance Simplifies queries, traversal, search and analytics
21
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graphs Are Big . . . and Getting Bigger
• Social Scale* – 1 billion vertices, 100 billion edges
• Web Scale* • 50 billion vertices, 1 trillion edges
• Brain Scale* • 100 billion vertices, 100 trillion edges
* An NSA Big Graph Experiment
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph for Social and Unstructured Data Analysis
Graph is a powerful tool for Data Analysis
you capture fine-grained, arbitrary
By representing your data as a graph with
relationships between data entities
Individual relationships are represented as links
When analyzing such a graph,
you are using explicit relationships
to find implicit information
about your data
Without computing multiple joins
24
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Supports Different Kinds of Graphs
RDF Data Model
• Data federation
• Knowledge representation
• Graph pattern analysis
Social Network Analysis
National Intelligence Public Safety Social Media search Marketing - Sentiment
Linked Data / Enterprise Metadata
Property Graph Model
• Graph Search & Analysis
• Big Data analytics
• Entity analytics
Life Sciences Health Care Publishing Finance
Spatial Network Analysis
Logistics Transportation Utilities Telcoms
Network Data Model
• Network path analysis
• Multi-model modeling
Use Case Graph Model Industry Domain
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
The Property Graph Data Model
• A set of vertices (or nodes) – each vertex has a unique identifier.
– each vertex has a set of in/out edges.
– each vertex has a collection of key-value properties.
• A set of edges (or links) – each edge has a unique identifier.
– each edge has a head/tail vertex.
– each edge has a label denoting type of relationship between two vertices.
– each edge has a collection of key-value properties.
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Common Graph Analysis Use Cases
Purchase Record
customer items
Product Recommendation Influencer Identification
Communication Stream (e.g. tweets)
Graph Pattern Matching Community Detection
Recommend the most similar item purchased by similar people
Find out people that are central in the given network – e.g. influencer marketing
Identify group of people that are close to each other – e.g. target group marketing
Find out all the sets of entities that match to the given pattern – e.g. fraud detection
27
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Analysis Examples:
• Attribute searching (Get people with a given name)
• Vertex/edge adjacency (Get people that like a given Web page)
• Fixed-length paths (Get the friends of the friends of a given person)
• Reach-ability (Is there a “friend” connection between two people?)
• Pattern matching (Get the common friends between two people)
• Aggregates (Get the number of friends of a given person)
28
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use cases
Social Media Management & Analysis Identifying common friends and interests; identify primary influencers in social network; graph data management of large social media services
Online Product Recommendations Recommender systems; sentiment analysis; customer churn analysis; customer behavior analytics; customer trend prediction
Internet of Things Manage data properties and relationships for complex webs of inter-operating devices and systems; predictive modeling of system behavior
Cyber-Security Fraud detection; identity management; reveal clusters of similar behaviors and properties; discover relationships based on pattern matching
29
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Product Details: Graph Features
30
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Architecture Oracle Big Data Spatial and Graph
Scalable and Persistent Storage
Graph Data Access Layer API
Graph Analytics In-memory Analytic Engine
REST W
eb Service
Blueprints & SolrCloud / Lucene
Property Graph Support on Apache HBase and Oracle NoSQL
Pyth
on
, Perl, PH
P, Ru
by,
Javascript, …
Java APIs
Java APIs
31
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Big Data Spatial and Graph Property Graph Features
• Highly scalable graph database and analytics engine
• Implemented on Apache HBase and Oracle NoSQL Database
• Rich developer APIs
– Blueprints, REST, Java graph plus support for Groovy, Python, PHP, Perl, Ruby, and JavaScript
• Fast, scalable suite of social network analysis functions
– Ranking, centrality, recommender, community detection, path finding…
– Targeted to address main industry requirements
• Manageability
– Bulk load
– Console to execute Java and Gremlin APIs
32
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Support for Open Source TinkerPop Graph Tool Stack Oracle Big Data Spatial and Graph Blueprints API implementation provides support for the de-facto graph database standard TinkerPop component stack.
These include query language, dataflow, REST APIs, and others.
33
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Property Graph Data Access Layer
• Tinkerpop Tools: Blueprints, Gremlin, Rexter supported
• Graph schema optimized on Apache HBase
• Graph schema optimized on Oracle NoSQL Database
• GraphML, GML, GraphSON, and Oracle-defined flat files (.ope & .opv)
• Bulk load of property graph data
34
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Property Graph Data Access Layer
• Parallel scan of property graph data
• Apache Lucene Text search of graph data
• Groovy shell for accessing property graph data
• iPython-based interface example
• SolrCloud integration
35
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Developer APIs for Analytics
• Blueprint APIs
– Modify/insert/delete graph edges, vertices, key-values
– Blueprints stack (Gremlin, Pipes, etc.) provide additional functionality
• Java: High level graph analysis APIs expose core functions of graph engine
– Customers perform graph analysis using these Java APIs
– Graph and subgraph identification (using key-value constraints)
– R access through Java APIs
• REST APIs
– Graph analysis
– Graph and sub-graph identification (using key-value constraints)
• Web scripting languages supported through REST APIs above
– PHP, Python, Ruby, Groovy, etc.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Property Graph in-Memory Graph Analytics
• Parallel data reading from data access layer into memory
• Choice of deployment
– Standalone application server
– On Hadoop node
• Graph formats (in addition to those supported by the data access layer)
– EBin, Adjacency list, Edge List
• J2EE container support (WLS, Tomcat, Jetty)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
35 Graph Functions
Detecting Components and Communities
Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang’s
Ranking and Walking
Pagerank, Personalized Pagerank, Betweenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants)
Evaluating Community Structures
∑ ∑
Conductance, Modularity Clustering Coefficient (Triangle Counting) Adamic-Adar
Path-Finding
Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s
Link Prediction SALSA (Twitter’s Who-to-follow)
Other Classics Vertex Cover Minimum Spanning-Tree(Prim’s)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 39
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
In-Memory Graph Analysis Framework
• Large graph analysis is time-consuming because …
– The computation typically involves touching most nodes and edges in the graph
– The data-access pattern is random
• In-memory, parallel framework for fast graph analytics
• Exploits the architecture of modern servers – The computation is parallelized using multiple CPU cores
– The non-sequential data-access is mitigated with large DRAMs
40
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Property Graph Engine
• Reads graph from Apache HBase or Oracle NoSQL
• Data Access Layer filtering used to create subgraph for Property Graph Engine analytics
Oracle Property Graph or RDF (HBase or NoSQL)
Property Graph Engine
Analytic Request
Analytic Request
Analytic Request
Analytic Request
Analytic Request
Analytic Request
Trans-actional Request
41
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 42