semantic graph databases: the evolution of relational databases

15
©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 1 ©2015 Cambridge Semantics Inc. All rights reserved. Graph - Why, What, How Barry Zane Vice President, Engineering [email protected]

Upload: cambridge-semantics

Post on 20-Mar-2017

604 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 1©2015 Cambridge Semantics Inc. All rights reserved.

Graph - Why, What, How

Barry ZaneVice President, [email protected]

Page 2: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 22

The Journey

•Why do relational guys become graph guys?–Relational is GREAT–“We shape our tools, and are shaped by our tools”

•When all you have is a hammer…

–Why Graph is the next evolution•Do more, easier, faster, cheaper

Page 3: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 33

Real Relational Data Warehouse, Really

• Relational Databases are predefined “rectangular” tables and rows with columns.

–Very natural for subjects (aka rows) with a number of known attributes common to all/most of the subjects.

–Allows columns to be links (aka keys) to other table’s subjects.• Challenged by:

–Sparsity–One-to-many needs a separate “join table”–You need to understand the data in advance

• Graphs are real relational, really. Just a little different than the points above!

Page 4: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 44

Nodes/Subjects, Edges/Attributes, Values/Objects

• Pretty picture, but what does it mean?• What is the data model?

Page 5: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 5

RDF Triples - Like Key-value Pairs(heterogenous, unique, atomic, simple)

JoeSmith LivesIn SanDiegoJoeSmith BirthDate 9/17/1975JoeSmith IsSpouse MaryJonesJoeSmith HasChild BillSmithJoeSmith HasChild JaneSmithJoeSmith Attended EDW2016JoeSmith Hobby “Hiking”JoeSmith Bought Pants962MaryJones LivesIn SanDiegoMaryJones BirthDate 7/10/1975MaryJones IsSpouse JoeSmithMaryJones HasChild BillSmithMaryJones HasChild JaneSmithMaryJones Attended Commicon16MaryJones Bought Sweater48MaryJones NickName “MJ” ...

Pants962 SKU 1934758967Pants962 Color BrownPants962 Inseam 32Pants962 Size 36Pants962 BoughtBy JoeSmithPants962 BoughtBy MikeDoeSweater48 SKU 1963095898Sweater48 Color RedSweater48 Size 6Sweater48 BoughtBy MaryJonesSanDiego Pop 2456824SanDiego Team ChargersSanDiego Team PadresSanDiego Climate “Perfect” ...(RDF stands for Resource Description Format… Triples!)

Page 6: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 66

SPARQL… Like SQL, but...• No explicit schema. The Ontology (fancy word for schema) is explicit in the data.• Further ontology information may also be called out in the data, such as inference

rules.• Standard SQL aggregates, joins, etc, but simple and powerful relationship capabilities.• “How is Joe related to Mary”

–In SQL Relational•Are they spouses?•Are they siblings?•Are they friends?•Do they have the same hobby?•… enumerate the choices, EXPLODES with degrees of separation

–In SPARQL Graph•How is Joe related to Mary?•… you can directly specify degrees of separation

• Pretty exciting, essentially all the power of SQL, but you can do more, with more diverse data, where the data tells you about itself, rather than you knowing in advance.

Page 7: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 77

There Will Be a Quiz (Not Really)“How is Joe related to Mary?”

SELECT * WHERE JoeSmith $connection MaryJones

JoeSmith IsSpouse MaryJones

“What do Joe and Mary have in common, to the first degree?”

SELECT $connection COUNT(*) WHERE JoeSmith $connection $thing MaryJones $connection $thingGROUP BY $connection

FriendsWith 45Attended 342LivesIn 1

Page 8: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 88

And Yes, Standard SQL Analytics“What is the population and average personal income of each city?”

SELECT $city count($person) avg($income) WHERE $person LivesIn $city $person Earns $incomeGROUP BY $city ORDER BY $city

Atlanta 647,465 34,459Boston 856,123 42,654

Chicago 1,456,589 39,475

Page 9: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 99

RDF Opens a World of Data and Relationships• Created by the World Wide Web Consortium (W3C), the folks that bring us HTTP, HTML,

XML, etc. Geared to the vast quantity and richness of the Internet.• Businesses and other organizations have much richer and varied data than they have

been able to work with.• The trend has been toward the Data Swamp - bring it together and hope something can

be gleaned from it.• RDF Triples are a simple way to describe and query nearly anything, even unstructured

material.

• Shameless Plugs:–Anzo Smart Data Lake - overlay layer on the data swamp to get meaning.–Anzo Smart Data Integration - ETL into SDL to make the swampy mess useful,

without losing details. Applies semantic (aka schema) annotation & structure.–Anzo Graph Engine - Analytics at scale on the SDL at interactive speeds.–Anzo On the Web - Query & Visualize the results, without knowing SPARQL

Page 10: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 1010

If This Is So Great, What Took So Long?• We’ve understood graph for a while, but graph had:

–Terrible performance at scale.–No application building/visualization tools for non-programmers.–No ETL support.–Too much hubris.–Too much “NoSQL” noise in the channel.

“If you want to teach people a new way of thinking, don’t bother trying to teach them. Instead, given them a tool, thus use of which will lead to new ways of thinking.”

R. Buckminster Fuller

… but graph is not a new way of thinking, it maps to how you already think!

Page 11: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 1111

Single Database Instance Across Many Nodes

• Behaves just like a single-node database, but faster• More speed and more data by clustering• Massively Parallel Processing - each CPU ‘owns’ a slice of the data that

it operates primarily on. Data is re-sliced as intermediate results during the query.

• Not a new concept, has been evolving since the 1980’s… Teradata, Netezza, Redshift, Hadoop...

Page 12: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 1212

Data Lake Subsets

• The lake is the “database”• Multiple Graph Query Engine instances, usually on

subsets• Short term instances - load, query, toss

Page 13: Semantic Graph Databases: The Evolution of Relational Databases

©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 1313

It Is All About Speed at Scale• Your Time can never be recovered. Your customers will find a better vendor. Your

patients will not thrive. The bad guys will win.• How to get speed:

–Leverage well-understood MPP concepts–Lessons from Netezza and Paraccel, technology is an evolution

•Understand and engineer for the interconnect.•Memory is far faster than disk, so compress and be in-memory.•Memory is slow, run ‘close to the silicon’ by using dynamically generated

code… try to do everything in machine registers.–Design specifically for Graph

•Similar to relational, but different.•Engineer for Dynamic, Heterogenous data typing.

–But, keep it simple to deploy and use.• Done right, can be hundreds of times faster than other implementations or thousands

of times faster than other big data approaches

Page 14: Semantic Graph Databases: The Evolution of Relational Databases

©2015 Cambridge Semantics Inc. All rights reserved.

Illustrative Pharma Company Use Case

Page 15: Semantic Graph Databases: The Evolution of Relational Databases

©2015 Cambridge Semantics Inc. All rights reserved.

Click here to view the full webinar