marklogic semantics overview · 2. i want to say something about the data (ontology) is licensed...
TRANSCRIPT
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Stephen Buxton, Senior Director, Product Management, MarkLogic
MARKLOGIC SEMANTICS OVERVIEW
It's not magic … MarkLogic Semantics
SEMANTICS IN A NUTSHELL
SLIDE: 4
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples – A Data Model Data is stored in triples, expressed as :Subject :Predicate :Object
:Jamie Vardy :team :Leicester
:Leicester :league :Premier League
Jamie Vardy
Leicester team
isIn England
birthPlace
Sheffield
1987
birthYear
striker
position
Premier League
league
Foxes nickname
RDF triples
SLIDE: 5
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples – A Data Model Data is stored in triples, expressed as :Subject :Predicate :Object
:User1 :runs :App1
:App1 :runsOn :Cluster1
User1
App1 runs
runsOn
Cluster1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
TopSecret
requires
Database1 accesses
RDF triples
See also: http://ontology.it/itsmo/v1/itsmo.html
SLIDE: 7
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples – A Data Model
Wayne Rooney
Manchester livesIn
isIn
England
birthPlace
Croxteth
1985
birthYear
Manchester United
team
Northwest England
region
Paul Griffiths sheriff
Data is stored in triples, expressed as :Subject :Predicate :Object
:User1 :runs :App1
:App1 :runsOn :cluster1
Senior Manager
User1 App1
cluster1 Geneva
Compliance Officer
TopSecret
database1
SLIDE: 8
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
URIs – Connecting the Graph URI (Universal Resource Identifier) – Like a URL, but it Identifies a thing, it does
not Locate a Thing
URIs make graphs from Triples and combine graphs from different sources
Jamie Vardy
team
Leicester
SLIDE: 9
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
URIs – Connecting the Graph URI (Universal Resource Identifier) – Like a URL, but it Identifies a thing, it does
not Locate a Thing
URIs make graphs from Triples and combine graphs from different sources
Jamie Vardy
team
Leicester
Population
Flag
980,800
Leicester
CIA FACTBOOK
Coordinates Leicester N 52° 43' W 1° 11'
GEONAMES SAME URI
SLIDE: 10
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
URIs – Connecting the Graph URI (Universal Resource Identifier) – Like a URL, but it Identifies a thing, it does
not Locate a Thing
URIs make graphs from Triples and combine graphs from different sources
Jamie Vardy
team
Population
Flag
980,800 CIA FACTBOOK
Coordinates Leicester
N 52° 43' W 1° 11'
GEONAMES
SLIDE: 11
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL – A Query and Update Language Query with SPARQL, gives us:
Simple lookup – Find people who run App1
Graph traversal – Find people who run an application that depends on Cluster1
Inference – Find people who depend on Cluster1
Update – App1 now runs on Cluster2
SELECT ?user
FROM <myGraph>
WHERE {
?user :runs :App1
}
SPARQL Simple Lookup “Find people who run App1”
User1 runs App1
SELECT ?user
FROM <myGraph>
WHERE {
?user :runs ?app .
?app :runsOn :Cluster1 .
}
SPARQL Graph Traversal “Find people who run an application that depends on Cluster1”
User1
Cluster1
runsOn
App1 runs SELECT ?user
FROM <myGraph>
WHERE {
?user :runs/:runsOn :Cluster1 .
}
SELECT ?user
FROM <myGraph>
WHERE {
?user :sees :Database1 .
}
SPARQL Inference “Find people who can see Database1”
User1 App1
Database1
accesses
runs
sees
DELETE { ?app :runsOn :Cluster1 }
INSERT { ?app :runsOn :Cluster2 }
WHERE
{ ?app :name 'my application'
}
SPARQL Update “App1 now runs on Cluster2”
App1 runsOn Cluster1
runsOn
Cluster2
SLIDE: 16
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Visualization – Your Data At A Glance
SLIDE: 17
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Visualization – Your Data At A Glance
Ontologies and Inference
SLIDE: 19
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Ontology – A Vocabulary With A Shared Meaning 1. I want to assert some facts (data)
<Dept1> owns <App1>
2. I want to say something about the data (ontology) <App1> is licensed software
3. I want to say something more general about the data model (ontology) Every piece of licensed software is an asset
4. For #2 and #3, I need some language (ontology language)
SLIDE: 20
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Ontology – A Vocabulary With A Shared Meaning My Ontology says: <App1> <isA> <licensedSoftware> <licensedSoftware> <subClassOf> <asset> <hardware> <subClassOf> <asset>
My Ontology Language says:
Formal definition of <isA>, <subClassOf> If A is a B and B is a subclass of C then A is a C
"Show me all the assets that Dept1 owns"
Popular Ontology Languages: RDFS OWL RDFS-Plus OWL-Horst
SLIDE: 21
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
"show me all the assets that <Dept1> owns"
The data says: <Dept1> owns <App1>
The ontology says: <App1> <isA> <licensedSoftware> <licensedSoftware> <subClassOf> <asset>
The ontology language says: If A is a B and B is a subclass of C then A is a C
Therefore <App1> is an asset that <Dept1> owns.
"show me all the zigs that <bis> aks"
The data says: <bis> aks <bab>
The ontology says: <bab> cuf <deb> <deb> bon <zig>
The ontology language says: If A cuf B and B bon C then A cuf C
Therefore bab is a zig that bis aks
"But I know what owns/is a /subclass means …"
SLIDE: 22
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Inference – Automatically Infer "New" Data Inference = Data + ontology + ontology language
Data is triples in the database
Ontology is triples in the database
Ontology language is a vocabulary with a shared meaning – rules
Apply any set of rules to any query!
Make up your own rules!
Simpler, more robust data modeling
New insights
SEMANTICS IN A NUTSHELL
SLIDE: 24
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Architecture
STORAGE LAYER
Scalability and Elasticity
ACID Transactions
Triple Store
INTERFACE LAYER
mlcp JSON, XML, RDF, Geo, Binaries
REST API
Graph / SPARQL
QUERY LAYER
JS XQuery SPARQL
JavaScript XQuery SPARQL SQL
INDEXES / CACHE Universal Index
Geospatial Index
Triple Index
Triple Cache
Automated Failover
Reverse Index
Oh the things you can do …
SLIDE: 26
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Triples when you want to …
Data Documents Triples
Store and query hundreds of billions of facts and relationships
Explore a graph
Visualize a graph
Leverage standards: data + query
Infer new information
better insights
simpler data modeling
Semantics of data
integration
SLIDE: 27
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Documents when you want to …
Data Documents Triples
Easily store heterogeneous data (transactional data, records, free-text)
Schema-agnostic
modeling freedom
integrate without ETL*
Search flexibility and specificity
Fast app development
SLIDE: 28
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Sidetrack – Documents and Data
Title
Date Body
Section
Section Section
Article
Abstract
Paragraph
Paragraph
Paragraph
Type
Date Parties
Seller
Buyer Channel
Trade
Amount
PaidBy
Affiliation
Name
SLIDE: 29
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Benefits of a Document Store and Triple Store Combined
Data Documents Triples
All the benefits of each, plus: Docs can contain triples, Triples can
annotate docs, Graphs can contain docs
– Faster data integration using semantics as the glue
– Ideal model for reference data, metadata, provenance
– Ability to run really powerful queries
Massive speed and scale
Simplicity of a single unified platform
Enterprise features (security, HA/DR, ACID transactions,…)
TRIPLES AND DOCUMENTS
SLIDE: 31
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples Alongside Documents
User1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
High risk person App1
runsOn
Cluster1 TopSecret
requires
Database1
accesses
runs
SLIDE: 32
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Show me documents that mention App1 (or its dependencies) … and "trades" or "markets" … that were valid yesterday afternoon … that were produced near HQ
see Intelligent Search, Infobox
Show me instructions to access App1
App1 user guide
How to get TopSecret access
Scope of Database1
see Dynamic Semantic Publishing
Triples Alongside Documents
SLIDE: 33
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Documents as Part of the Graph
User1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
High risk person
App1
runsOn
Cluster1 TopSecret
requires
Database1
accesses
runs
deep dive
license
user guide tutorialMovie
order
SLIDE: 34
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Document as opaque object
Show me all the instructional documents related to App1
Search inside the document
Show me all the applications that managers use that expire in the next 6 months
Documents as Part of the Graph
SLIDE: 35
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples About Documents – Extended Metadata
User1
rank
Senior Manager
Geneva
basedIn
Compliance Officer
role
High risk person
App1
runsOn
Cluster1 TopSecret
requires
Database1
accesses
runs order
format
JSON
English
Delaware
2016-12-31
jurisdiction
expires
Ts and Cs
language
SLIDE: 36
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples are a natural way to represent metadata about documents
Extended because that metadata is part of the graph
Example: show me all orders for a TopSecret app that will expire soon
Triples About Documents – Extended Metadata
SLIDE: 37
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples About Documents - Integration
App1 order
XML
format
License
Vendor
<VENDOR>
Seller
<seller>
Provider
<prov>
Computer
Asset
equivalent equivalent
The Semantics of Data
Semantic Data Model
SLIDE: 39
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Integration: Dirty data
Show me license documents from vendor Acme
Data Integration: Overlapping data
Show me all assets from vendor Acme
Triples About Documents
SLIDE: 40
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples as part of a document
<order id="12345"> <VENDOR>Acme Corp</VENDOR> <payment> <amount>3427</amount> <unit>USD</units> <period>annual</period> </payment> <sem:triple> <sem:subject>http://youruri.com/orders/12345</sem:subject> <sem:predicate>http://youruri.com/predicates/expires</sem:predicate> <sem:object>2016-12-31</sem:object> </sem:triple> <sem:triple> <sem:subject>http://youruri.com/orders/12345</sem:subject> <sem:predicate>http://youruri.com/predicates/TsAndCs</sem:predicate> <sem:object>http://youruri.com/terms/34567</sem:object> </sem:triple> <description> .... </description> </order>
XML [JSON] Document With Embedded Triples
<userInfo> <source>myApp44</source> <confidence>100</confidence> <location>37.52 -122.25</location> <icd9-proc-code>1111</icd9-proc-code> <temporal> <systemStart/><systemEnd/> <validStart>2014-04-03T11:00:00</validStart> <validEnd>2014-04-03T16:00:00</validEnd> </temporal> ... <sem:triple> <sem:subject>http://youruri.com/users/11111</sem:subject> <sem:predicate>http://youruri.com/predicates/runs</sem:predicate> <sem:object>http://youruri.com/applications/1111</sem:object> </sem:triple> <sem:triple> <sem:subject>http://youruri.com/users/11111</sem:subject> <sem:predicate>http://youruri.com/predicates/manages</sem:predicate> <sem:object>http://youruri.com/applications/3333</sem:object> </sem:triple> </userInfo>
Set of Triples with XML [JSON] annotation
SLIDE: 43
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Triples as part of a document Embed triples in a document
Triples and document have the same security, transactions, backup, temporality, …
Annotate triples in an entirely generic way (XML or JSON)
Provenance
Confidence
Bitemporal
Query across triples and documents in the same query
SPARQL, restrict result to some source, confidence range, bitemporal range
Search, restrict result to documents that contain some facts or metadata
SLIDE: 44
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Triples when you want to …
Data Documents Triples
Store and query hundreds of billions of facts and relationships
Explore a graph
Visualize a graph
Leverage standards: data + query
Infer new information
better insights
simpler data modeling
Semantics of data
integration
SLIDE: 45
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Documents when you want to …
Data Documents Triples
Easily store heterogeneous data (transactional data, records, free-text)
Schema-agnostic
modeling freedom
integrate without ETL*
Search flexibility and specificity
Fast app development
SLIDE: 46
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Document Store and Triple Store Combined
Data Documents Triples
All the benefits of each, plus: Docs can contain triples, Triples can
annotate docs, Graphs can contain docs
– Faster data integration using semantics as the glue
– Ideal model for reference data, metadata, provenance
– Ability to run really powerful queries
Massive speed and scale
Simplicity of a single unified platform
Enterprise features (security, HA/DR, ACID transactions,…)
SLIDE: 47
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Summary Semantics is not magic …
MarkLogic is an Enterprise Triple Store
MarkLogic is a Document Database
Documents + Triples + Combinations means:
model your data in the right way
integrate data from many sources in many shapes
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics Use Cases
SLIDE: 50
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search
Semantic Metadata Hub
Dynamic Semantic Publishing
Recommendation Engines
Compliance
Entertainment Company
Pharmaceutical Company
SLIDE: 51
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Before MarkLogic
Repair manuals Technical bulletins
Recall notices
Word
SGML
Expert advice
Shredded content
RDBMS
Expert content
RDBMS Manuals, CDs, DVDs
Challenging UX
Challenging UX
1967 1994 2009 2011
SLIDE: 52
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
After MarkLogic
Repair manuals Technical bulletins
Recall notices
Word
SGML
Expert advice
Historical Repair Orders
100 Million Docs 28 Manufacturers +10GB/month
SLIDE: 53
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Getting smarter with semantics
Link different terms that mean the same or similar things
1
Compositional hierarchy to relate each part to the whole (“partonomy”)
2
Engine Engine cooling
Conditioner compressor
gasket oil pan gasket 196,000+ Unique Vehicles
…
Vocabulary 1
Vocabulary 2
Vocabulary 3
Vocabulary 4
Searchable Knowledge
Graph
SLIDE: 55
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search
Semantic Metadata Hub Dynamic Semantic Publishing
Recommendation Engines
Compliance
Entertainment Company
Pharmaceutical Company
SLIDE: 56
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Inception Pre-Production Production Post-Production Distribution Archive
Semantics Descriptive Metadata
Technical Administrative
Metadata Taxonomies Content Customers!
Entertainment Company – Semantic metadata hub
Script
Budget Scheduling
Script Supervision
Updates
Prop Inventory
Editorial
Complex Data Integration
SLIDE: 57
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Entertainment Company
Entertainment Company – Semantic metadata hub
Ontology Mgmt, Semantic Enrichment
RDF Triple Store, Search and Query
Metadata Hub Assets
Downstream Systems
RDF
Outputs
Title
HD Master Dates
Production Date
Editing Date
Release Date
International Date
is
Asset
Title
Character
Film Series
Animated
Actress
City
Data Model Using Documents + Data + Triples
Search App
SLIDE: 58
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search
Semantic Metadata Hub
Dynamic Semantic Publishing Recommendation Engines
Compliance
Entertainment Company
Pharmaceutical Company
SLIDE: 59
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
BBC – Dynamic Semantic Publishing For 2012 Olympics, semantics helped BBC manage content for web pages with real-time updates—without additional support
1. Diego Costa plays for Chelsea
2. Chelsea is in the Premier league
3. Diego plays in the Premier league
Semantic Inference
X 10,000
SLIDE: 60
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search
Semantic Metadata Hub
Dynamic Semantic Publishing
Recommendation Engines Compliance
Entertainment Company
Pharmaceutical Company
SLIDE: 61
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Recommendation Engines
SLIDE: 62
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Recommendation Engines
Talent Kristen Wiig
Acted in
Episode 4 Anne Hathaway and Killers
Part of Played
Character Maharelle Sister
Season 34
Segment The Lawrence Welk Show
Aired on
Date 10/4/08
Era
Acted in
Includes
Part of
SLIDE: 63
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search
Semantic Metadata Hub
Dynamic Semantic Publishing
Recommendation Engines
Compliance
Entertainment Company
Pharmaceutical Company
SLIDE: 64
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Amgen – Integrated Drug Terminology Database Connecting internal and external knowledge via the power of open graph data
REFERENCE DATA ARCHITECTURE MarkLogic advantage: Semantics / graph – the
power of native RDF support to ingest standardized datasets, combined with enterprise proof points
Flexible model – potential to further expand from primarily RDF to documents and document-like metadata
SLIDE: 65
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leading Organizations Using MarkLogic Semantics
Intelligent Search Semantic Metadata Hub Dynamic Semantic Publishing Recommendation Engines Compliance
Entertainment Company
Pharmaceutical Company
It's not magic … MarkLogic Semantics
It's