modern data architecture for today’s application needs · a scalable, distributed graph database...

Post on 22-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Modern Data Architecture for

Today’s Application Needs

Alexander Gauthier

Principal Solutions Engineer

Strategic Accounts

alexander.gauthier@datastax.com

About This

Presenter…Southern California

DataStax – SE, Strategic Accounts

Hortonworks – Solutions Engineer

Teradata – Engineering, Pre-sales

Aster Data – Principal Engineer

Informatica – Senior Engineer

© DataStax, All Rights Reserved.2

Legacy approach

to HA/DR

• Expensive DB

• Expensive Shared

Storage

• Additional Replication

Solution (SRDF, Golden

Gate, Hitachi, etc)

• Expensive Clustering

Solution (VRTS Cluster

Server, others)

• Now Double all that!

Veritas Cluster Server Service

© DataStax, All Rights Reserved.3

Let’s takea moment

© 2017 DataStax, All Rights Reserved. Company Confidential

billions

© 2017 DataStax, All Rights Reserved. Company Confidential

CONTEXTUAL

Requirements for today’s applications

ALWAYS-ON DISTRIBUTED SCALABLEREAL-TIME

© 2017 DataStax, All Rights Reserved. Company Confidential

Apps have changed

7

Client/Server Cloud1990s Today

Web

2000s

And what powers them

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.8

And what powers them

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.9

Scale-out App Layer

Scale-out Data Layer

Cloud Application Characteristics

© DataStax, All Rights Reserved.10

Real-Time DistributedAlways-OnContextual Scalable

Powering Cloud Applications

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.11

Effortless scale

Always-on● Designed to handle any failure,

no matter how catastrophic.

● Take advantage of every

opportunity.

● Focus on what matters most to

you.

Instant insight● Built into your application to

create actionable, modern

experiences.

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.12

How it works

DSE Architecture

© DataStax, All Rights Reserved.13

Analytical SQL

APACHE CASSANDRA™STORAGE FOR ANY TYPE OF DATA - Fault-tolerant, Scalable, Performant, Secure, Unified

Transactional AnalyticsTransactional Analytics

DataCenter 1 DataCenter 2

Analytics

SQL

Machine LearningSearch

JDBC/ODBC

RESTful

APIGraph Analytics

Application Layer

Customer experience – Fraud Detection – IoT – Recommendation Engine – Enterprise Search

Terminology

• Node: A single instance

• Rack: A logical grouping of nodes (optional)

• Data Center: A logical grouping of racks or nodes

• Cluster: A logical grouping of data centers (1 to N)

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.14

DC1 DC2

RAC2

RAC1

Read/Write Anywhere

• Cassandra has a ‘location independence’

architecture, which allows any user to connect

to any node in any data center and read/write

the data they need

• All writes are automatically evenly

partitioned/distributed across the nodes and

replicated automatically throughout the cluster

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the

United States and/or other countries.15

10

50

3070

80

40

20

60

Client

We can still retrieve the data

from the other 2 nodes

Node fails or goes down temporarily

No single point of failure

• Best in class fault tolerance

• Replication automatically handled

• Remains operationally simple at scale

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.16

Multi-data center

• Replicate data across data

centers or cloud availability zones

• No interruption to the business with

any outage

• Global low-latency performance

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.17

10

50

3070

80

40

20

60

Clie

nt

15

55

3575

85

45

25

65

West Data CenterEast Data Center

Outage

10

50

3070

80

40

20

60

Client

Flexible deployment options

Cloud: take full advantage of the clouds

elasticity and global availability. With easy

migrations to any and every cloud provider you’ll

never be locked in.

Hybrid: have the best of both worlds, spanning a

single cluster across on-premises and cloud.

Hub and spoke: have a central hub with many

spokes. Perfect for intermittent connections,

compliance, or optimizing for location needs.

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.18

Linear scalability

• Data partitioned among all nodes in the cluster

• Linear scalability (performance / storage)

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.19

50,000 trans/sec

500 GB 100,000 trans/sec

1 TB

200,000 trans/sec

2 TB

Datastax Reference Architecture

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.20

HTTP Application Message Queue

StreamingAnalytics

BatchAnalytics

Real-time

Hybrid Multi-Cloud Architectures

• Distribute Fault

• Negotiate Locality

• Regional Compliance

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.21

AWS AZURE GCPPHYSICAL

DC

Replication Replication Replication

App

Tier

Cassandra Data Model

• Row-oriented, column structure

• Keyspace: similar to a database in

the RDBMS world

• Table: similar to an RDBMS table

but more flexible/dynamic

• A row in a column family is indexed by

its key.

• Other columns may be indexed as well

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.22

ID Name SSN DOB

Portfolio Keyspace

Customer Table

Cassandra Query Language (CQL)

• Syntax similar to RDBMS SQL

• Create objects via DDL

For example:

CREATE, INSERT, UPDATE,

DELETE, GRANT, REVOKE

SELECT, WHERE

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.23

CQL Example

CREATE TABLE market_prices ( symbol TEXT,date TIMESTAMP,price DECIMAL,side INT,PRIMARY KEY (symbol, date)

) WITH CLUSTERING ORDER BY(date DESC);

Write Path

DataStax is a registered trademark of DataStax, Inc. and its

subsidiaries in the United States and/or other countries.24

F

L

U

S

HID NAME DOB

AB1 John Smith 10/11/1972

AB2 Bob Jones 3/1/1964

ZZ3 Mike West 4/22/1968

IN MEMORY

ON DISK

MEMTABLE

COMMIT LOG

ID NAME DOB

ID NAME DOB

AB3 Mary Smith 1/11/1982

AB4 Jane Hess 3/1/1992

AB1 Jonny Smith 10/11/1972

ID NAME DOB

AB3 Mary Smith 1/11/1982

AB4 Jane Hess 3/1/1992

AB1 Jonny Smith 10/11/1972

ZZ3 Mike West 4/22/1968

AB2 Bob Jones 3/1/1964

F

L

U

S

H

SSTABLES

sequential

Putting it to use

Indexing

& Search

Streaming

Analytics

Graph

Batch

Analytics

Integrated Multi-Model/Mixed Workload Platform

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.34

DSE Search

Live indexing engine with powerful search

• Automatic indexing on insert

• Higher ingestion throughput

• Distributed query optimization

Compared to self-managed:

• No separate search cluster to manage

• Probably less total hardware required

• No “Split Brain” data inconsistencies

• No ETL or synch to build and maintain

• No app level data management codeDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.35

Search

+

Cassandra

Your Application

CQL

DSE Analytics

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.36

Your Application

Real Time OperationsCassandra

Analytics

Analytics

Queries

Your AnalyticsReal Time Replication

Single DSE Custer

Streaming, ad-hoc, and batch

• High-performance

• Workload management

• SQL reporting

Compared to self-managed:

• No ETL

• True HA without Zookeeper

DSE Graph

A scalable, distributed graph database that is optimized for storing, traversing

and querying complex graph data in real time

• Value data between relationships

• DSE Analytics and Search integrated

• Perfect for use cases: Customer360,

Recommendations, Fraud Detection

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.37

©2016 DataStax

• A key difference between a graph database and an RDBMS is how relationships between

entities/vertexes are prioritized and managed.

• While an RDBMS uses foreign keys to connect entities in a secondary fashion, edges (the relationships)

in a graph database are of first order importance.

• Relationships are explicitly embedded in a graph data model.

• A graph-shaped business problem is one in which the concern is with the relationships (edges) among

entities (vertexes) than with the entities in isolation.

RDBMS Graph

An identifiable “something” or object to keep track of Entity Vertex

A connection or reference between two objects Relationship Edge

A characteristic of an object Attribute Property

RDBMS vs. Graph

©2016 DataStax

RDBMS vs. GraphRDBMS Graph

Simple to moderate data complexity Heavy data complexity

Hundreds of potential relationships Hundreds of thousands to millions or billions of potential relationships

Moderate JOIN operations with good performance Heavy to extreme JOIN operations required

Infrequent to no data model changes Constantly changing and evolving data model

Static to semi-static data changes Dynamic and constantly changing data

Primarily structured data Structured and unstructured data

Nested or complex transactions Simple transactions

Always strongly consistent Tunable consistency (eventual to strong)

High availability (handled with failover) Continuous availability (no downtime)

Centralized application that is location dependent (e.g. single

location), especially for write operations and not just read

Distributed application that is location independent (multiple locations

involving multiple data centers and/or clouds) for write and read

operations

Scale up for increased performance Scale out for increased performance (for some graph DB’s)

Intelligent

Data Layer

• Logic

• Learning

• Understanding

• Reasoning

• Retention

© DataStax, All Rights Reserved.40

Deloitte:

Mission Graph

1. Explore and Analyze: Connect the dots

across multiple datasets with a single

search.

2. 2. Visualize Sophisticated Networks:

Create network diagrams that illustrate

relationships and identifyassociations.

3. Enrich Analysis with Public Data: Link

open-source data to help improve insight

into high-valueentities.

4. Link Unstructured Relationships: Extract

unstructured data and pair it to structured

data.

5. Machine Learning: Continuously improve

targeting algorithms through intelligent

machine learning. The more you use

Mission Graph, the smarter it gets.

6. Proactively Manage Network Risk:

Diagnose how influencers and events

create risk to identify similar patterns and

help prevent future incidents.© DataStax, All Rights Reserved.41

Build and Manage

Advanced security

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.47

External Authentication

External validation of authorized users

Leverages Kerberos & LDAP/AD

Single sign-on to all data domains

Transparent Data Encryption

Protects sensitive data at rest

via encryption

No changes needed at app

level

Data Auditing

Audit trail of all accesses and

changes

Control to audit only what’s

needed

Uses log4j interface to ensure

performance & efficient audit

Build how you want

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.48

Drivers

• ODBC

• JDBC

Friendly GUI for CQL developers

• Visually Create and

Navigate Database

Objects

• Tune Queries for

Faster Performance

• Powerful Context-

Aware Editor

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.49

DevCenter

Explore, query, and analyze DSE Graph

• Gremlin Query

Language

• Auto-completion,

result set

visualization,

execution

management, and

much more.

• Friendly Fluent API

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.50

Studio

Visual management for DSE

• Automate what no

one likes – backups,

repairs

• REST API to work in

your world

• Instantly manage your

cluster, scaling up or

down at a moment’s

notice

• Monitor your cluster

and follow best

practices, ensuring a

secure environment

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.51

OpsCenter

Q&A

59 CONFIDENTIAL - © DataStax, All Rights Reserved.

top related