coolest graph features in oracle database...
TRANSCRIPT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Coolest Graph Features in Oracle Database 12c Xavier Lopez, Ph.D. Senior Director Zhe Wu, Ph.D. Architect
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
The following is intended to outline our general product direcOon. It is intended for informaOon purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or funcOonality, and should not be relied upon in making purchasing decisions. The development, release, and Oming of any features or funcOonality described for Oracle’s products remains at the sole discreOon of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Agenda
• Graph Database Strategy • Customer Use Cases • Oracle SpaOal and Graph RDF Graph Features • Future Plans
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Graph Database Strategy
Support Graph Data Types… …On all enterprise pla5orms
• Oracle Database • Oracle NoSQL Database • Oracle Big Data Appliance • Oracle Cloud
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
What Sets Us Apart?
• Scalability: Trillions of triples • TransacOonal: Concurrent loading and updates with ACID properOes • Security: OLS security labels at “triple” level (OLS). • Standards based: W3C
• Manageable: Use exisOng DB tools, uOliOes and experOse
• MulO-‐type support: graph, relaOonal, search, geospaOal …
• MulO-‐placorm: RelaOonal database, NoSQL, Hadoop
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
RDF Graph v. Property Graph
RDF Seman9c Graphs Property Graphs
• Use Case: – Social network analysis
• AnalyOcs: – Clustering, centrality, page rank, path finding
• AnalyOcs ExecuOon – In-‐memory, In-‐database
• Use Case: – Linked data, semanOc metadata layer
• AnalyOcs: – pafern matching, Inferencing
• AnalyOcs ExecuOon – In-‐database
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
RDF SemanOc Graph feature of Oracle SpaOal and Graph For Oracle Database 12c
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Find related content & relaOons by navigaOng connected enOOes
“Reason” across enOOes
Find related content & relaOons by navigaOng connected enOOes
“Reason” across enOOes
Two ApplicaOon Use Cases
Linked Data EnOty AnalyOcs
• Unified metadata model for distributed data sources
• Flexible model for sparse and evolving data
• Validate semanOc and structural consistency
SPARQL pafern matching
DetecOng related enOOes across large, sparse, disparate collecOons of data
Inferencing: Applying rules on asserted data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Graph-‐based Metadata Layer
Linked Data in Support of Distributed Data
– W3C standard, flexible model for sparse and evolving data
– Common vocabulary enables data integraOon & app development
– RelaOonal data stays in place, apps don’t need to change
Database Server
HR Database Sales Database
Inventory Database
HR Schema Inventory Schema Sales Schema
Mid-Tier Server ApplicaOon 1
ApplicaOon 2 ApplicaOon 3
SQL RDF Graph Inventory Graph Sales Graph
Shared Ontologies
SPARQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Linked Data in Enterprise
Index
Content Mgmt BI Server Data Warehouse
Machine Generated Data
Seman9c Graph model
Transac9on Systems
Hadoop Appliance
Subscrip9on Services Human Sourced
Informa9on Social Media
Event Server
Data Servers
Data Sources / Types
Access & Presenta9on Layer
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Linked Data / Enterprise Metadata
• Life Sciences • Finance • Media • Networks & CommunicaOons
• Defense & Intelligence • Police
Industries
Hutchinson 3G Austria
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Business Challenge • Link database information on genes,
proteins, metabolic pathways, compounds, ligands, etc. to original sources.
• Increase productivity for accessing, sharing, searching, navigating, cross-linking, analyzing internal /external data
Novartis Institutes for BioMedical Research (NIBR)
Solution • Semantic integration layer on RDF graph • Rich domain-specific terminology (biology,
chemistry and medicine) 1.6 M terms • Terminology Hub: 8 GB of referential data
that cross-references between data repositories.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Find related content & relaOons by navigaOng connected enOOes
“Reason” across enOOes
Find related content & relaOons by navigaOng connected enOOes
“Reason” across enOOes
RDF SemanOc Graph-‐based ApplicaOons
Linked Data EnOty AnalyOcs
• Unified metadata model for distributed data sources
• Flexible model for sparse and evolving data
• Validate semanOc and structural consistency
SPARQL pafern matching
DetecOng related enOOes across large, sparse, disparate collecOons of data
Inferencing: Applying rules on asserted data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Knowledge Management in Intelligence Domain
Data Sources Contents Repository
Databases Web resources
Blogs, Mails, news, RSS feeds
Informa9on Extrac9on Feature Extrac9on, Term Extrac9on
Extracted En99es & Rela9onships
RDF Intelligence Ontologies SQL/SPARQL
Search, Presenta9on, Report, Visualiza9on, Query
Na9onal Intelligence Scenario
Enterprise Data Spa9al Documents
Person: Abduwali Abdukhadir Muse
Nationality: Somalian Country: UK
Group: Al Shabab
Ideology: Islamist
Person: ?
Nationality: Pakistani
Country: Pakistan
Group: ?
Person: Chehab Abdouljamid Bouyaly
Country: Morocco
Group: al Qaeda
Currently resides
Member of
Currently resides
Member of
Supports
Supports
Link ?
Link ?
Member of
Currently resides
Has
Has
images
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle SpaOal and Graph RDF SemanOc Graph Features
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Database 12c RDF SemanOc Graph Database
• Exadata ready • Compression & parOOoning • Parallel load, inference, query • High availability • Label security: triple-‐level • W3C standards compliance • SemanOc Indexing of text • Enterprise Manager
• Native RDF graph data store • Manages billions of triples • Optimized storage architecture
• SPARQL-Jena/Joseki, Sesame • SQL/graph query, B-tree indexing • Ontology assisted SQL query
• RDFS, OWL2 RL, EL, SKOS • User-defined rules • Incremental, parallel reasoning • User-defined inferencing • Plug-in architecture
Load / Storage
Query
Reasoning
• Semantic indexing framework • Integration with
• OBIEE, Oracle R Enterprise • Oracle Data Mining
Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Support for Apache Jena and OpenRDF Sesame
Provides applica9on developers with:
• Easy-‐to-‐use Java APIs to access Oracle databases and RDF files • A standard-‐compliant SPARQL web service endpoint (Joseki, Fuseki)
• Data loading (RDF/XML, N-‐TRIPLES, N-‐QUADS, TriG ,Turtle)
• JSON output • Oracle-‐specific extensions for query execuOon control and management
Leverage exisOng investments in open source frameworks
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• RDF views on relaOonal tables • Enables SPARQL query on distributed resources
• Views: AutomaOc and custom • Aligns with W3C RDB2RDF standard • No duplicaOon of data and storage
RDB to RDF Mapping
RelaOonal to RDF Mapping
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Label Security Data ClassificaOon
• Fine grained security through integraOon with Oracle Label Security • Model level security through GRANT/REVOKE privileges • Oracle Label Security -‐ mandatory access control
• Labels assigned to both users and data • Data labels determine the sensi*vity of the rows or the rights a person must posses in order to read or write the data.
• User labels indicate their access rights to the data records.
19
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Core Inferencing Features
• Forward-‐chaining based inference engine in the database • NaOve rulebases: RDFS, OWL 2 RL, OWL 2 EL, SKOS
• ValidaOon of inferred data • Proof generaOon • User defined inferencing
-‐ Temporal reasoning, SpaOal reasoning
• Ladder Based Inference -‐ Fine grained security for inference graph
• IntegraOon with external OWL 2 reasoners (TrOWL, Pellet)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. 21
RDF SemanOc Graph: Graph VisualizaOon & Modeling Support
Cytoscape
Graph Visualiza9on Seman9c Modeling
Protégé
Oracle ConfidenOal – Internal/Restricted/Highly Restricted
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Analyzing RDF with Oracle BI and Oracle Advanced AnalyOcs
Oracle BI Oracle Advanced AnalyOcs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Partner Tools: (IO InformaOcs)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Partner Tools: Tom Sawyer Social Network Analysis
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Manageability of RDF SemanOc Graph Built in support from Oracle Database uOliOes and tools
Control query execuOon: • in database & Jena client Create & monitor graph w/ SQL Developer: • SemanOc Network • Models, virtual models • Btree indexes • Rule bases • Entailments • Security data labels • SemanOc index policies
Tune / Analyze Ingest / Replicate / Recover Manage
Tune load/ query/ inference: • Parallelism • Btree indexing triple/quad • Typed literals indexing • SPARQL query hints • StaOsOcs gathering • Dynamic Sampling
Analyze performance: • Enterprise Manager: view opOmizer plans, monitor execuOon / resource usage
Bulk load: • Apache Jena bulk loader • Oracle external tables & • SQL*Loader (Direct Path) w/ PL/SQL Bulk Load API
Replicate & recover: • Data Guard: physical standby • Data Pump: staging tables • Recovery Manager: RMAN
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Open GeospaOal ConsorOum: GeoSPARQL Support
Defines a Vocabulary for SpaOal Query Paferns – Classes
• SpaOal Object, Feature, Geometry
– ProperOes • Topological relaOons • Links between features and geometries
– Datatypes for geometry literals • ogc:wktLiteral, ogc:gmlLiteral
• Query FuncOons – Topological relaOons, distance, buffer, intersecOon, …
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• RDF Graph support in Oracle NoSQL Database Enterprise EdiOon
• High performance Key Value store • SPARQL 1.1 access to graph data • Jena & Joseki SPARQL Web Services • Massive horizontal scalability • Support for World Wide Web ConsorOum (W3C) SemanOc Web standards
RDF Graph for Oracle NoSQL
Graph Support on Oracle NoSQL DB Brings horizontal scalability to RDF graph applicaOons
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• High volume, simple queries (low latency) • Queries aggregaOng over most of the graph (e.g. what are the hobbies of the 100 most popular people in the network)
• Frequent, large-‐scale updates • Large Data Centers
RDF Graph for Oracle NoSQL
When to Consider a NoSQL Database for Graphs Horizontal scalability, low query latency/cost, ease of install & management
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Install Oracle Database 12c
or Use a Prebuilt VM from OTN
Ini9alize -‐ CreaOng a tablespace ‘ts’ -‐ Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’)
-‐ Run as SYS (for 12.1.0.2 only) in SQL*Plus exec mdsys.enableGeoRaster;
Configure Joseki/Fuseki web service endpoint
Using Java APIs Load/Query/Inference through
GraphOracleSem, DatasetGraphOracleSem,
OracleBulkUpdateHandler, …
Using SQL/PLSQL APIs exec create_sem_model
insert/delete triples, bulk load, run SEM_MATCH, create_entailment, …
SPARQL Query SPARQL Update
REST APIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Install Oracle Database 12c
or Use a Prebuilt VM from OTN
Ini9alize -‐ CreaOng a tablespace ‘ts’ -‐ Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’)
-‐ Run as SYS (for 12.1.0.2 only) in SQL*Plus exec mdsys.enableGeoRaster;
Configure Joseki/Fuseki web service endpoint
Using Java APIs Load/Query/Inference through
GraphOracleSem, DatasetGraphOracleSem,
OracleBulkUpdateHandler, …
Using SQL/PLSQL APIs exec create_sem_model
insert/delete triples, bulk load, run SEM_MATCH, create_entailment, …
SPARQL Query SPARQL Update
REST APIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Install Oracle Database 12c
or Use a Prebuilt VM from OTN
Ini9alize -‐ CreaOng a tablespace ‘ts’ -‐ Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’)
-‐ Run as SYS (for 12.1.0.2 only) in SQL*Plus exec mdsys.enableGeoRaster;
Configure Joseki/Fuseki web service endpoint
Using Java APIs Load/Query/Inference through
GraphOracleSem, DatasetGraphOracleSem,
OracleBulkUpdateHandler, …
Using SQL/PLSQL APIs exec create_sem_model
insert/delete triples, bulk load, run SEM_MATCH, create_entailment, …
SPARQL Query SPARQL Update
REST APIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle SpaOal and Graph -‐ LUBM 200K on 3-‐Node RAC X2-‐4 Load, Inference and Query Performance
The LUBM 200K Graph has 48+ Billion triples (edges) – Original graph has 26.6 Billion unique triples (quads)
– Inference produced another 21.4 Billion triples
Data Loading Performance – Triples Loaded and Indexed Per Second (TLIPS): 273K
Inference Performance – Triples Inferred and Indexed Per Second (TIIPS): 327K
SPARQL Query Performance – Query Results Per Second (QRPS): 459K
Setup: Hardware: Sun Server X2-‐4, 3-‐node RAC
-‐ Each node configured with 1TB RAM, 4 CPU 2.4GHz 10-‐Core Intel E7-‐4870)
-‐ Storage: Dual Node 7420, both heads configured as: Sun ZFS Storage 7420 4 CPU 2.00GHz 8-‐Core (Intel E7-‐4820) 256G Memory 4x SSD SATA2 512G (READZ) 2x SATA 500G 10K. Four disk trays with 20 x 900GB disks @10Krpm, 4x SSD 73GB (WRITEZ)
So^ware: Oracle Database 11.2.0.3.0, SGA_TARGET=750G and PGA_AGGREGATE_TARGET=200G Note: Only one node in this RAC was used for performance test. Test performed in April 2013.
48+ Billion edges graph
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle SpaOal and Graph – LUBM 4400K on Exadata X4-‐2 Load, Inference and Query Performance
Oracle ConfidenOal – Internal 35
Degrees of Parallelism
Data set Load (B triples/hr)
OWL Inference (B triples/hr)
Query (B answers/hr)
256* LUBM 4400K
605.4B / 115.2hrs
475.6+ B / 86hrs 30m
92.5B / 22.5 hrs
Exadata X4-‐2 High capacity full rack ZS3-‐2 with 2 controllers, 8 trays of disk Eight compute nodes of Exadata Oracle 12.1.0.1 DB standard install of Exadata * A mix of DOP used: 296, 256, 192
Open cursors = 1000 Processes = 1000 SGA = 132GB, PGA = 100GB 32K blocksize was given to all graph tablespaces TEMP group was created with 3 bigfile tablespaces Test performed in Aug/Sept 2014.
Setup:
Data Loading Performance – Triples Loaded and Indexed Per Second (TLIPS): 1.420M
Inference Performance – Triples Inferred and Indexed Per Second (TIIPS): 1.527M
SPARQL Query Performance – Query Results Per Second (QRPS): 1.130M
1.08 Trillion edges graph
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Best PracOces in Solving Performance Issues • When there is an underperforming SQL in RDF data loading, inference,
or query operations, check: • Have you gathered staOsOcs?
• APIs: export_model_stats,export_entailment_stats, export _network_stats, import_model_stats, import_entailment_stats, import_network_stats
• Have you tried parallel execuOon? • Balanced hardware is key.
• Have you tried dynamic sampling? (Level 6, 8, 11)
• Is there a lack of indexes (including text index)? • DO NOT just add indexes without careful & thorough tes9ng
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• When there is an underperforming SQL in RDF data loading, inference, or query operations, check:
• Have you looked at the plan? • Is it possible to write the same query in a different way?
• Is it possible to simplify?
• Simpler queries Befer chance to find more efficient ways to execute
• Tweak plan through hints • Send a small, reproducible test case with the execu9on plan to Oracle Support or post it on the Forum
Best PracOces in Solving Performance Issues (2)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• Find the top thread(s) in Java VM
• Are there excessive GC acOviOes? • Try –XX:+UseParallelGC, -‐XX:+UseConcMarkSweepGC, …
• Has the heap size been set properly? • Try larger heap size, analyze heap by performing a heap dump
• Send a small, reproducible test case with the thread dump to Oracle Support or post it on the Forum
Best PracOces in Solving Performance Issues (3)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Cool Ongoing AcOviOes:
• Enable Oracle Cloud Services: Oracle Social Network • IntegraOon with Oracle business applicaOons and middleware • Ongoing support for RDF Graph on all major placorms
• RelaOonal Database • NoSQL Database • Big Data (Hadoop) • Cloud
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Graph Related Events at OOW
• RDF SemanOc Graph Meet-‐up • Wednesday, 4:00pm. Moscone South, Lobby/Mezzanine
• Big Fast Graph Analysis for Hadoop (session) • Thursday, 2:30pm. Moscone North 131
• Demo Grounds
• Graph Database • Oracle Labs (Graph AnalyOcs)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
W3C Seman9c Technology Stack
http://www.w3.org/2007/03/layerCake.svg
• Core Technologies • URI
• Uniform resource idenOfier
• RDF • Resource descripOon
framework
• RDFS • RDF Schema
• OWL • Web ontology language
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
What is RDF Ÿ A graph data model for web resources
and their relaOonships
Ÿ The graph can be serialized into -‐ RDF/XML, N3, N-‐TRIPLE, …
Ÿ ConstrucOon unit: Triple (or asserOon, or fact)
<hfp://foobar> <:produces> <:mp3>
Ÿ Quads (named graphs) add context, provenance, idenOficaOon, etc. to asserOons
<hfp://foobar> <:produces> <:mp3 > <:ProductGraph>
Subject Predicate Object
hfp://www.foobar.com
hfp://www.foobar.com/products/mp3
hfp://…/locatedIn
hfp://…/produce
hfp://www.oracle.com
hfp://www.oracle.com/products/RDF
hfp://…/produce
hfp://…/customerOf
hfp://…/uses