datastax: datastax enterprise - the multi-model platform
TRANSCRIPT
DataStax Enterprise: The Multi-Model Data Platform
1 Multi-Model Defined
2 The evolution of Models
3 Graph Databases Overview
4 The DataStax Approach to Graph Databases
5 DataStax in the Open Source Graph Community
6 Conclusion 2 © 2015. All Rights Reserved.
Multi Model Defined
© 2015. All Rights Reserved. 3
Most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. In contrast, a multi-model database is designed to support multiple data models against a single, integrated backend. Source - https://en.wikipedia.org/wiki/Multi-model_database
The evolution of Models
© 2015. All Rights Reserved. 4
Application Application
Application (OLTP and OLAP)
Data Access Abstraction
RDBMS / SQL Polyglot Persistence
Multi Model
DSE Multi Model
© 2015. All Rights Reserved. 5
Multiple (polygot) Models exposed via cohesive interaction mechanisms, (APIs) for OLTP and OLAP workloads, (mixed workload) with a unified persistence layer, (Apache Cassandra) providing GR, always on characteristics within a TCO efficient data base platform for a variety of use cases
Cassandra
©2015 DataStax Confidential. Do not distribute without consent. 6
DataStax Enterprise – Wide Row / MVs
C1 C2
MV1c1
MV1c2
Agg1c1
Agg2c1
©2015 DataStax Confidential. Do not distribute without consent. 7
DataStax Enterprise - JSON
Inserting JSON data is easy
Reading JSON data is easy
Finding JSON errors is easy
Introduction to Graph Databases
© 2015. All Rights Reserved. 8
What is a Graph Database?
©2015 DataStax Confidential. Do not distribute without consent.
• Store, manage and query highly connected data • Data stored as nodes (vertices), edges and properties to represent
a domain model • By explicitly embedding relationships in the data model, you store
a more logical business model using the natural data access language Gremlin
• Think of a graph asa pre-joineddatabase
Choose the Right Model to Fit your Business Needs
© 2015 DataStax, All Rights Reserved. Company Confidential 10
DSE Wide Row - Build and maintain models using
CQL’s DDL features - Super-fast CRUD at scale with CQL
DML features - Good option for denormalized
schemas with high-throughput requirements
- Perfect fit for IoT applications that require consuming enormous amounts of data with specific data retrieval requirements: product catalogs, high-volume messaging systems, collecting and storing sensor data
DSE Graph - Flexible schema that is easy to modify
and maintain with Gremlin - Clearly maps business semantics to a
logical model for easy maintenance and understanding
- Ideal model for highly-connected data models
- Perfect fit for social-engagement models, recommendation engines and IT network / device management
- Update and query the graph in real-time with easy-to-learn open-source Gremlin language
- Good option for transitioning from slow RDBMS 3NF models with lots of JOINs
What is DataStax Enterprise (DSE) Graph?
©2015 DataStax
• Highly scalable graph database for modern web, mobile, and IoT applications that need to manage highly connected and heterogeneous data
• Built-in support for real-time search, and analytic graph queries via tight integration with the DSE platform
• A property graph model native inside the DataStax product, engineered specifically for Cassandra
• Store & find relationships in data fast and easy on huge graphs
DSE Graph: Built-in Scalability
©2015 DataStax
• Scale out Graph vs. Scale up only • Graph partitioning built on Cassandra’s scale-out architecture • Graph index structures integrated into Cassandra • Domain model maps more naturally to data model, allowing for greater
understanding between business and IT • Traverse millions of relationships in a short period of
time, faster than modeling the data in RDBMS • Flexible data model that can be easily adapted to
business changes
DSE Graph: Integrated Search, Analytics and Ops
©2015 DataStax
• Real-time traversals over complex-structured graph data • Integrates with DSE Search to mix search with traversal queries • Integrates with DSE Analytics and Spark to support OLAP and breadth-first
graph traversals • Iterative graph analytics like PageRank or other centrality measures • Reporting and aggregates over graph data • Integrated with DataStax OpsCenter
Graph Database Use Cases
© 2015. All Rights Reserved. 14
Additional Graph Use Cases
360 Degree View of Your Customer • Collect massive amounts of data point about your customer • Data collected from social networks, web analytics, digital ads, mobile devices, CRM • Bring heterogeneous customer data together into DSE Graph • Uncover buying patterns and customer behaviors • Graph becomes a master data hub for customer data • Use the graph customer hub to build better products
for your target customers • Keep the customers you already have with
customer intelligence Customer 360 View
Social
Store Sensors
CRM
Mobile
Weblogs
Additional Graph Use Cases IT Network and Device Management • Allows IT to monitor, manage and protect corporate networks and devices
(laptops, iPads, mobile phones, etc.) • Requires understanding of network topologies and relationships between
devices, interfaces, equipment, people, services … • A traditional RDBMS would require
expensive query-time joins • A graph model intrinsically knows how
to traverse the topology because the relationships are already stored
• This makes for quick & easy recognition of problems, root cause analysis and event correlation
DataStax in the Graph Open Source Community TinkerPop / Gremlin
© 2015. All Rights Reserved. 17
DataStax Role in TinkerPop
©2015 DataStax
• DataStax utilizes the TinkerPop framework for the DSE Graph product
• DataStax will contribute to the TinkerPop community and is
heavily invested in the success of the Gremlin language • DataStax will provide resource guides, documentation,
samples and training on building and querying graphs with Gremlin, using DSE Graph as the graph engine
Gremlin The open-source standard graph query language
• DataStax contributes and supports the Apache TinkerPop community, along with the Gremlin Graph query language
• g.V.hasLabel('person').as('a').out('knows').as('b').select('a','b').by('age')
.by('age') "for all people in the graph, give me the ages of the people on each end of a friendship relationship“
• g.V.has('name','marko').out('knows').out('mother').outE('worksFor'). has('time',between(2001,2002)).inV.name “what are the names of the places that marko's friends' mothers worked for from 2001 to 2002”Deep traversal == multiple levels of query-time joins in RDBMS
Recommendation Query – RDBMS vs. Graph
©2015 DataStax
SELECT TOP (5) [t14].[ProductName] FROM (SELECT COUNT(*) AS [value], [t13].[ProductName] FROM [customers] AS [t0] CROSS APPLY (SELECT [t9].[ProductName] FROM [orders] AS [t1] CROSS JOIN [order details] AS [t2] INNER JOIN [products] AS [t3] ON [t3].[ProductID] = [t2].[ProductID] CROSS JOIN [order details] AS [t4] INNER JOIN [orders] AS [t5] ON [t5].[OrderID] = [t4].[OrderID] LEFT JOIN [customers] AS [t6] ON [t6].[CustomerID] = [t5].[CustomerID] CROSS JOIN ([orders] AS [t7] CROSS JOIN [order details] AS [t8] INNER JOIN [products] AS [t9] ON [t9].[ProductID] = [t8].[ProductID]) WHERE NOT EXISTS(SELECT NULL AS [EMPTY] FROM [orders] AS [t10] CROSS JOIN [order details] AS [t11] INNER JOIN [products] AS [t12] ON [t12].[ProductID] = [t11].[ProductID] WHERE [t9].[ProductID] = [t12].[ProductID] AND [t10].[CustomerID] = [t0].[CustomerID] AND [t11].[OrderID] = [t10].[OrderID]) AND [t6].[CustomerID] <> [t0].[CustomerID] AND [t1].[CustomerID] = [t0].[CustomerID] AND [t2].[OrderID] = [t1].[OrderID] AND [t4].[ProductID] = [t3].[ProductID] AND [t7].[CustomerID] = [t6].[CustomerID] AND [t8].[OrderID] = [t7].[OrderID]) AS [t13] WHERE [t0].[CustomerID] = N'ALFKI' GROUP BY [t13].[ProductName]) AS [t14] ORDER BY [t14].[value] DESC
g.V('customerId','ALFKI').as('customer') \ .out('ordered').out('contains').out('is').as('products') \ .in('is').in('contains').in('ordered').except('customer') \ .out('ordered').out('contains').out('is').except('products') \ .groupCount().cap().orderMap(T.decr)[0..<5].productName
VS.
Thank you