data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

29
Cassandra and TitanDB Insights into DataStax's Graph Strategy Robin Schumacher VP Products Dr. Matthias Broecheler Director of Engineering

Upload: datastax

Post on 14-Jul-2015

455 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Cassandra and TitanDBInsights into DataStax's Graph Strategy

Robin Schumacher – VP Products

Dr. Matthias Broecheler – Director of Engineering

Page 2: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Agenda

• Overview of DataStax

• Introduction to Graph

• Comparing Graph to an RDBMS

• A Look at DataStax’s Graph Strategy

• Next Steps

©2015 DataStax

Page 3: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Founded in April 2010

450+

Santa Clara, Austin, New York, London,

Paris, Tokyo, Sydney

410+Employees Customers

30Percent

Overview

Page 4: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

1970s 1990s

Client-ServerMainframe

Evolution of Data Management

4

Monolithic hardware

Centralized workloads

Vendor lock-in

General purpose databases (one size fits all)

Isolated / semi-connected

Commodity hardware

Distributed workloads

Massive scalability

Radically connected

Today

Cloud Mobile Social

Infrastructure centric Application / data centric

Page 5: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Cassandra – NoSQL for Modern Enterprise Workloads

Always on

Fully distributed

Best in scale and performance

80%+ contributions -> DataStax

Free tools and drivers

Free training

©2015 DataStax

San

Francisco

Stockholm

New York

Page 6: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Enabling The Internet Enterprise with DataStax Enterprise

©2015 DataStax

Page 7: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Introduction to Graph

Page 8: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

What is a Graph Database?

©2015 DataStax

High Level Used to manage highly connected or complex data

User Level Used to support traversal and analytic queries against a data

model that uses vertices, edges and properties to represent

and store data

Technical Level Uses specialized index structures, data partitioning

techniques, and query optimizers to efficiently traverse large

graphs

Page 9: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

What is a Graph Database?

©2015 DataStax

DataStaxDataBricks

Spark

DSE

CassandraJonathan Ellis

Robin Schumacher

Billy Bosworth

worksFor

title: VP Product

develops

uses

uses

reportsTo

worksFor

title: CTO

worksFor

title: CEO

Page 10: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

What is a Graph Database?

©2015 DataStax

DataStaxDataBricks

Spark

DSE

CassandraJonathan Ellis

Robin Schumacher

Billy Bosworth

worksFor

title: VP Product

develops

uses

uses

reportsTo

worksFor

title: CTO

worksFor

title: CEO

Property

Edge

Vertex

Page 11: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

A Graph Database Helps Answer Queries Like…

…should an initiated transaction be considered fraudulent or malicious based

on past user actions or normal patterns of system behavior?

…what products or actions should we recommend to a user based on their

preferences and behavioral patterns to maximize sales or user engagement?

…what campaigns should be run for different segments of a company’s

customer base?

©2015 DataStax

Page 12: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Comparing Graph DB to RDBMS

Page 13: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Key Difference Between Graph DB and RDBMS

©2015 DataStax

RDBMS Graph DB

Process to query data elements

(joins) is inefficient on large data

sets or many relationships

Better performance for relationship

queries due to specialized index

structures

Expressing JOIN-intensive queries

in SQL is time-consuming and error-

prone

Intuitive query language enabling

faster application development

Page 14: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

RDBMS vs. Graph DB: Query Complexity

©2015 DataStax

SELECT TOP (5) [t14].[ProductName]

FROM (SELECT COUNT(*) AS [value],

[t13].[ProductName]

FROM [customers] AS [t0]

CROSS APPLY (SELECT [t9].[ProductName]

FROM [orders] AS [t1]

CROSS JOIN [order details] AS [t2]

INNER JOIN [products] AS [t3]

ON [t3].[ProductID] = [t2].[ProductID]

CROSS JOIN [order details] AS [t4]

INNER JOIN [orders] AS [t5]

ON [t5].[OrderID] = [t4].[OrderID]

LEFT JOIN [customers] AS [t6]

ON [t6].[CustomerID] = [t5].[CustomerID]

CROSS JOIN ([orders] AS [t7]

CROSS JOIN [order details] AS [t8]

INNER JOIN [products] AS [t9]

ON [t9].[ProductID] = [t8].[ProductID])

WHERE NOT EXISTS(SELECT NULL AS [EMPTY]

FROM [orders] AS [t10]

CROSS JOIN [order details] AS [t11]

INNER JOIN [products] AS [t12]

ON [t12].[ProductID] = [t11].[ProductID]

WHERE [t9].[ProductID] = [t12].[ProductID]

AND [t10].[CustomerID] = [t0].[CustomerID]

AND [t11].[OrderID] = [t10].[OrderID])

AND [t6].[CustomerID] <> [t0].[CustomerID]

AND [t1].[CustomerID] = [t0].[CustomerID]

AND [t2].[OrderID] = [t1].[OrderID]

AND [t4].[ProductID] = [t3].[ProductID]

AND [t7].[CustomerID] = [t6].[CustomerID]

AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]

WHERE [t0].[CustomerID] = N'ALFKI'

GROUP BY [t13].[ProductName]) AS [t14]

ORDER BY [t14].[value] DESC

g.V('customerId','ALFKI').as('customer')

.out('ordered').out('contains').out('is').as('products')

.in('is').in('contains').in('ordered').except('customer')

.out('ordered').out('contains').out('is').except('products')

.groupCount().cap().orderMap(T.decr)[0..<5].productNa

me

VS.

Page 15: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

RDBMS vs. Graph DB: Data Modeling

©2015 DataStax

SELECT TOP (5) [t14].[ProductName]

FROM (SELECT COUNT(*) AS [value],

[t13].[ProductName]

FROM [customers] AS [t0]

CROSS APPLY (SELECT [t9].[ProductName]

FROM [orders] AS [t1]

CROSS JOIN [order details] AS [t2]

INNER JOIN [products] AS [t3]

ON [t3].[ProductID] = [t2].[ProductID]

CROSS JOIN [order details] AS [t4]

INNER JOIN [orders] AS [t5]

ON [t5].[OrderID] = [t4].[OrderID]

LEFT JOIN [customers] AS [t6]

ON [t6].[CustomerID] = [t5].[CustomerID]

CROSS JOIN ([orders] AS [t7]

CROSS JOIN [order details] AS [t8]

INNER JOIN [products] AS [t9]

ON [t9].[ProductID] = [t8].[ProductID])

WHERE NOT EXISTS(SELECT NULL AS [EMPTY]

FROM [orders] AS [t10]

CROSS JOIN [order details] AS [t11]

INNER JOIN [products] AS [t12]

ON [t12].[ProductID] = [t11].[ProductID]

WHERE [t9].[ProductID] = [t12].[ProductID]

AND [t10].[CustomerID] = [t0].[CustomerID]

AND [t11].[OrderID] = [t10].[OrderID])

AND [t6].[CustomerID] <> [t0].[CustomerID]

AND [t1].[CustomerID] = [t0].[CustomerID]

AND [t2].[OrderID] = [t1].[OrderID]

AND [t4].[ProductID] = [t3].[ProductID]

AND [t7].[CustomerID] = [t6].[CustomerID]

AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]

WHERE [t0].[CustomerID] = N'ALFKI'

GROUP BY [t13].[ProductName]) AS [t14]

ORDER BY [t14].[value] DESC

VS.

Page 16: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Comparing Graph DB to NoSQL

Page 17: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Key Difference Between Graph DB and NoSQL

©2015 DataStax

NoSQL Graph DB

Data model can’t represent relationships

between rows or documents requiring

application developers to maintain those

inside the application which is

cumbersome, inefficient, and error prone

Natively supports

relationships in the data

model and provides a query

language to efficiently

retrieve them

Page 18: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

NoSQL vs. Graph DB: Query Expressivity

©2015 DataStax

g.V('customerId','ALFKI').as('customer')

.out('ordered').out('contains').out('is').as('products')

.in('is').in('contains').in('ordered').except('customer')

.out('ordered').out('contains').out('is').except('products')

.groupCount().cap().orderMap(T.decr)[0..<5].productNam

e

VS.?(requires application code)

Page 19: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

A Look at DataStax’s Graph Strategy

Page 20: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Product Strategy for 2015

© 2015 DataStax, All Rights Reserved. 20

• Part of DataStax’s product strategy in 2015 will be to support multiple

data models in DataStax Enterprise (DSE)

• Support for multi-model will occur across several releases of DSE in

2015

Page 21: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Why Multi-Model in DataStax Enterprise?

21

Transactions Analytics Search

Mixed Workload Needed?

Solved in DSE

Wide Row Graph JSON

Mixed Model Needed?

Solved in DSE

DSE

Analytics

Search

Transactions

DSEWide Row

JSON

Graph

Page 22: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Why Graph?

©2015 DataStax

Page 23: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Why Graph?

• Best answer for applications having highly connected data

• Key enabler of systems of engagement and systems of insight applications

• Use cases include:

• Personalization

• Social engagement systems (e.g. matchmaking services, contacts

catalogs, etc.)

• Fraud detection

• Financial analysis

• Security analysis

• Communication

• Supply chain management

©2015 DataStax

Page 24: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

©2015 DataStax

Page 25: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Titan – the Foundation for DSE Graph

• Titan is a scalable, distributed graph database that is optimized for storing,

traversing and querying complex graph data in real time

• Titan is open source and licensed under the Apache 2

• Current technical benefits include:

• Built on top of Cassandra, Hbase, and BerkeleyDB

• Scale-out and multi-data center capable

• Able to support thousands of concurrent users and billions of graph data points

• Analytics on graph data supported via Hadoop integration

• Search enabled via support for Solr, Lucene, and Elasticsearch

©2015 DataStax

Page 26: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

What is DataStax Enterprise Graph?

DSE Graph is a scalable graph database solution for modern Web and mobile

applications that need to manage highly connected data

DSE Graph will be deeply integrated into

the DSE platform:

• Tight Cassandra integration

• Graph analytics powered by Spark

• DSE Search support

• OpsCenter monitoring

©2015 DataStax

Page 27: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

2015 Plans for Titan / DSE Graph

• DataStax will contribute to TinkerPop and is dedicated to making it the #1

open source graph framework

• Release Titan 1.0 (TP3 compatible; a prerequisite coming out 1-2 months

before)

• First release of DSE Graph to occur in DSE 5.0. EAP builds will be

available for interested customer

• Recommendations for customer are to continue to develop using

TinkerPop to ensure seamless compatibility with DSE Graph

• DataStax to provide utilities/instructions for moving existing Titan

databases to DSE Graph

©2015 DataStax

Page 28: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Next Steps

• Check DataStax blog for updates on DSE Graph

• If a current DSE customer, contact us about participating in upcoming

Early Adopter Program (EAP) releases of DSE Graph

• If haven’t tried DSE yet, download it from

http://www.datastax.com/download and follow our getting started guide in

your own environment (or use the DataStax Sandbox)

©2015 DataStax

Page 29: Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Thank you!

Questions?