modern data architecture for today’s application needs · a scalable, distributed graph database...

41
Modern Data Architecture for Today’s Application Needs Alexander Gauthier Principal Solutions Engineer Strategic Accounts [email protected]

Upload: others

Post on 22-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Modern Data Architecture for

Today’s Application Needs

Alexander Gauthier

Principal Solutions Engineer

Strategic Accounts

[email protected]

Page 2: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

About This

Presenter…Southern California

DataStax – SE, Strategic Accounts

Hortonworks – Solutions Engineer

Teradata – Engineering, Pre-sales

Aster Data – Principal Engineer

Informatica – Senior Engineer

© DataStax, All Rights Reserved.2

Page 3: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Legacy approach

to HA/DR

• Expensive DB

• Expensive Shared

Storage

• Additional Replication

Solution (SRDF, Golden

Gate, Hitachi, etc)

• Expensive Clustering

Solution (VRTS Cluster

Server, others)

• Now Double all that!

Veritas Cluster Server Service

© DataStax, All Rights Reserved.3

Page 4: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Let’s takea moment

© 2017 DataStax, All Rights Reserved. Company Confidential

Page 5: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

billions

© 2017 DataStax, All Rights Reserved. Company Confidential

Page 6: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

CONTEXTUAL

Requirements for today’s applications

ALWAYS-ON DISTRIBUTED SCALABLEREAL-TIME

© 2017 DataStax, All Rights Reserved. Company Confidential

Page 7: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Apps have changed

7

Client/Server Cloud1990s Today

Web

2000s

Page 8: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

And what powers them

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.8

Page 9: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

And what powers them

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.9

Scale-out App Layer

Scale-out Data Layer

Page 10: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Cloud Application Characteristics

© DataStax, All Rights Reserved.10

Real-Time DistributedAlways-OnContextual Scalable

Page 11: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Powering Cloud Applications

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.11

Effortless scale

Always-on● Designed to handle any failure,

no matter how catastrophic.

● Take advantage of every

opportunity.

● Focus on what matters most to

you.

Instant insight● Built into your application to

create actionable, modern

experiences.

Page 12: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.12

How it works

Page 13: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

DSE Architecture

© DataStax, All Rights Reserved.13

Analytical SQL

APACHE CASSANDRA™STORAGE FOR ANY TYPE OF DATA - Fault-tolerant, Scalable, Performant, Secure, Unified

Transactional AnalyticsTransactional Analytics

DataCenter 1 DataCenter 2

Analytics

SQL

Machine LearningSearch

JDBC/ODBC

RESTful

APIGraph Analytics

Application Layer

Customer experience – Fraud Detection – IoT – Recommendation Engine – Enterprise Search

Page 14: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Terminology

• Node: A single instance

• Rack: A logical grouping of nodes (optional)

• Data Center: A logical grouping of racks or nodes

• Cluster: A logical grouping of data centers (1 to N)

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.14

DC1 DC2

RAC2

RAC1

Page 15: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Read/Write Anywhere

• Cassandra has a ‘location independence’

architecture, which allows any user to connect

to any node in any data center and read/write

the data they need

• All writes are automatically evenly

partitioned/distributed across the nodes and

replicated automatically throughout the cluster

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the

United States and/or other countries.15

Page 16: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

10

50

3070

80

40

20

60

Client

We can still retrieve the data

from the other 2 nodes

Node fails or goes down temporarily

No single point of failure

• Best in class fault tolerance

• Replication automatically handled

• Remains operationally simple at scale

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.16

Page 17: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Multi-data center

• Replicate data across data

centers or cloud availability zones

• No interruption to the business with

any outage

• Global low-latency performance

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.17

10

50

3070

80

40

20

60

Clie

nt

15

55

3575

85

45

25

65

West Data CenterEast Data Center

Outage

10

50

3070

80

40

20

60

Client

Page 18: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Flexible deployment options

Cloud: take full advantage of the clouds

elasticity and global availability. With easy

migrations to any and every cloud provider you’ll

never be locked in.

Hybrid: have the best of both worlds, spanning a

single cluster across on-premises and cloud.

Hub and spoke: have a central hub with many

spokes. Perfect for intermittent connections,

compliance, or optimizing for location needs.

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.18

Page 19: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Linear scalability

• Data partitioned among all nodes in the cluster

• Linear scalability (performance / storage)

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.19

50,000 trans/sec

500 GB 100,000 trans/sec

1 TB

200,000 trans/sec

2 TB

Page 20: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Datastax Reference Architecture

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.20

HTTP Application Message Queue

StreamingAnalytics

BatchAnalytics

Real-time

Page 21: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Hybrid Multi-Cloud Architectures

• Distribute Fault

• Negotiate Locality

• Regional Compliance

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.21

AWS AZURE GCPPHYSICAL

DC

Replication Replication Replication

App

Tier

Page 22: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Cassandra Data Model

• Row-oriented, column structure

• Keyspace: similar to a database in

the RDBMS world

• Table: similar to an RDBMS table

but more flexible/dynamic

• A row in a column family is indexed by

its key.

• Other columns may be indexed as well

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.22

ID Name SSN DOB

Portfolio Keyspace

Customer Table

Page 23: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Cassandra Query Language (CQL)

• Syntax similar to RDBMS SQL

• Create objects via DDL

For example:

CREATE, INSERT, UPDATE,

DELETE, GRANT, REVOKE

SELECT, WHERE

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.23

CQL Example

CREATE TABLE market_prices ( symbol TEXT,date TIMESTAMP,price DECIMAL,side INT,PRIMARY KEY (symbol, date)

) WITH CLUSTERING ORDER BY(date DESC);

Page 24: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Write Path

DataStax is a registered trademark of DataStax, Inc. and its

subsidiaries in the United States and/or other countries.24

F

L

U

S

HID NAME DOB

AB1 John Smith 10/11/1972

AB2 Bob Jones 3/1/1964

ZZ3 Mike West 4/22/1968

IN MEMORY

ON DISK

MEMTABLE

COMMIT LOG

ID NAME DOB

ID NAME DOB

AB3 Mary Smith 1/11/1982

AB4 Jane Hess 3/1/1992

AB1 Jonny Smith 10/11/1972

ID NAME DOB

AB3 Mary Smith 1/11/1982

AB4 Jane Hess 3/1/1992

AB1 Jonny Smith 10/11/1972

ZZ3 Mike West 4/22/1968

AB2 Bob Jones 3/1/1964

F

L

U

S

H

SSTABLES

sequential

Page 25: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Putting it to use

Page 26: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Indexing

& Search

Streaming

Analytics

Graph

Batch

Analytics

Integrated Multi-Model/Mixed Workload Platform

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.34

Page 27: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

DSE Search

Live indexing engine with powerful search

• Automatic indexing on insert

• Higher ingestion throughput

• Distributed query optimization

Compared to self-managed:

• No separate search cluster to manage

• Probably less total hardware required

• No “Split Brain” data inconsistencies

• No ETL or synch to build and maintain

• No app level data management codeDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.35

Search

+

Cassandra

Your Application

CQL

Page 28: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

DSE Analytics

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.36

Your Application

Real Time OperationsCassandra

Analytics

Analytics

Queries

Your AnalyticsReal Time Replication

Single DSE Custer

Streaming, ad-hoc, and batch

• High-performance

• Workload management

• SQL reporting

Compared to self-managed:

• No ETL

• True HA without Zookeeper

Page 29: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

DSE Graph

A scalable, distributed graph database that is optimized for storing, traversing

and querying complex graph data in real time

• Value data between relationships

• DSE Analytics and Search integrated

• Perfect for use cases: Customer360,

Recommendations, Fraud Detection

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.37

Page 30: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

©2016 DataStax

• A key difference between a graph database and an RDBMS is how relationships between

entities/vertexes are prioritized and managed.

• While an RDBMS uses foreign keys to connect entities in a secondary fashion, edges (the relationships)

in a graph database are of first order importance.

• Relationships are explicitly embedded in a graph data model.

• A graph-shaped business problem is one in which the concern is with the relationships (edges) among

entities (vertexes) than with the entities in isolation.

RDBMS Graph

An identifiable “something” or object to keep track of Entity Vertex

A connection or reference between two objects Relationship Edge

A characteristic of an object Attribute Property

RDBMS vs. Graph

Page 31: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

©2016 DataStax

RDBMS vs. GraphRDBMS Graph

Simple to moderate data complexity Heavy data complexity

Hundreds of potential relationships Hundreds of thousands to millions or billions of potential relationships

Moderate JOIN operations with good performance Heavy to extreme JOIN operations required

Infrequent to no data model changes Constantly changing and evolving data model

Static to semi-static data changes Dynamic and constantly changing data

Primarily structured data Structured and unstructured data

Nested or complex transactions Simple transactions

Always strongly consistent Tunable consistency (eventual to strong)

High availability (handled with failover) Continuous availability (no downtime)

Centralized application that is location dependent (e.g. single

location), especially for write operations and not just read

Distributed application that is location independent (multiple locations

involving multiple data centers and/or clouds) for write and read

operations

Scale up for increased performance Scale out for increased performance (for some graph DB’s)

Page 32: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Intelligent

Data Layer

• Logic

• Learning

• Understanding

• Reasoning

• Retention

© DataStax, All Rights Reserved.40

Page 33: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Deloitte:

Mission Graph

1. Explore and Analyze: Connect the dots

across multiple datasets with a single

search.

2. 2. Visualize Sophisticated Networks:

Create network diagrams that illustrate

relationships and identifyassociations.

3. Enrich Analysis with Public Data: Link

open-source data to help improve insight

into high-valueentities.

4. Link Unstructured Relationships: Extract

unstructured data and pair it to structured

data.

5. Machine Learning: Continuously improve

targeting algorithms through intelligent

machine learning. The more you use

Mission Graph, the smarter it gets.

6. Proactively Manage Network Risk:

Diagnose how influencers and events

create risk to identify similar patterns and

help prevent future incidents.© DataStax, All Rights Reserved.41

Page 34: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Build and Manage

Page 35: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Advanced security

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.47

External Authentication

External validation of authorized users

Leverages Kerberos & LDAP/AD

Single sign-on to all data domains

Transparent Data Encryption

Protects sensitive data at rest

via encryption

No changes needed at app

level

Data Auditing

Audit trail of all accesses and

changes

Control to audit only what’s

needed

Uses log4j interface to ensure

performance & efficient audit

Page 36: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Build how you want

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.48

Drivers

• ODBC

• JDBC

Page 37: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Friendly GUI for CQL developers

• Visually Create and

Navigate Database

Objects

• Tune Queries for

Faster Performance

• Powerful Context-

Aware Editor

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.49

DevCenter

Page 38: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Explore, query, and analyze DSE Graph

• Gremlin Query

Language

• Auto-completion,

result set

visualization,

execution

management, and

much more.

• Friendly Fluent API

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.50

Studio

Page 39: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Visual management for DSE

• Automate what no

one likes – backups,

repairs

• REST API to work in

your world

• Instantly manage your

cluster, scaling up or

down at a moment’s

notice

• Monitor your cluster

and follow best

practices, ensuring a

secure environment

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.51

OpsCenter

Page 40: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

Q&A

Page 41: Modern Data Architecture for Today’s Application Needs · A scalable, distributed graph database that is optimized for storing, traversing and querying complex graph data in real

59 CONFIDENTIAL - © DataStax, All Rights Reserved.