reltio: powering enterprise data-driven applications with cassandra

29
Powering Enterprise Datadriven Applications with Cassandra

Upload: datastax-academy

Post on 08-Jan-2017

864 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Powering  Enterprise  Data-­driven  Applications  with  Cassandra

Page 2: Reltio: Powering Enterprise Data-driven Applications with Cassandra

“ ”2

Be  Right  Fasterwith  

Reliable  Data,  Relevant  Insights,

Recommended  Actions

TM

#DataManagement

#BigData

#ML

©  2015.  All  Rights  Reserved.    

Page 3: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Anastasia  ZamyshlyaevaVP  Platform  Product  Management  and  Co-­founder  @  Reltio  

• 2011  – started  working  with  C*

• 2012  – selected  C*  as  the  persistence  store  for  creating  a  hybrid  Columnar  &  Graph  data-­store

• Since  2012  – Running  in  Production  to  support:  

– 24/7  uptime  with  99.995%  availability

– Multi-­Tenancy  across  customers

– both  Operational  and  Analytical  workloads

[email protected]/in/azamyshlyaeva

©  2015.  All  Rights  Reserved.     3

Page 4: Reltio: Powering Enterprise Data-driven Applications with Cassandra

“If you focus on the smallest details, you never get the big picture right”

~  Leroy  Hood

©  2015.  All  Rights  Reserved.     4

Page 5: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     5

Page 6: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     6

Page 7: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     7

Page 8: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     8

Sales

Web  site

Support

Supply

Marketing

Page 9: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     9

Sales

Web  site

Supply

MarketingSupport

Page 10: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     10

Sales

Web  site

Supply

MarketingSupport

Page 11: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Enterprise  Applications  Ecosystem11©  2015.  All  Rights  Reserved.    

Is  data  up-­to-­date?

Is  data  correct?

?? ?Is  data  complete?

Page 12: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     12

Page 13: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     13

Sales

Web  site

Data  Unification  Application

Supply

(based  on  Relational  Databases)• Fixed  structure• No  big  data• Expensive• Hard  to  support  graphs  and  complex  attributes• Single  point  of  failure  (often) MarketingSupport

Page 14: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     14

Sales

Web  site

Supply

MarketingSupport (based  on  Cassandra)

Page 15: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Why  Cassandra?üHigh performance

üFault tolerance

üLinear scalability

üMulti-datacenter

©  2015.  All  Rights  Reserved.     15

Page 16: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Reltio Metadata-driven Model and Operations

©  2015.  All  Rights  Reserved.     16

Doctors  and  HospitalsSchema

configureUI,  REST  API,  Analytics

Page 17: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     17

Oil  &  GasSchema

Reltio Metadata-driven Model and Operations

UI,  REST  API,  Analyticsconfigure

Page 18: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     18

Asset  CatalogSchema

Reltio Metadata-driven Model and Operations

UI,  REST  API,  Analyticsconfigure

AMan

Page 19: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Cassandra   is  a  primary  datastore

©  2015.  All  Rights  Reserved.     19

Page 20: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     20

ID: doc1Type: IndividualName: JohnEmail: [email protected]

[email protected]: CA, shipping

NY, billing

Entity type: Individual- Name: String- Email: List- Address: Complex

- State: String- Type: List

Metadata Entity

doc1<Name>.1 …

John

Simple  metadata-­driven  attributes  in  Cassandra  (Thrift  API)

Metadata-­driven  Documents  in  Columnar  storage

Page 21: Reltio: Powering Enterprise Data-driven Applications with Cassandra

ID: doc1Type: IndividualName: JohnEmail: [email protected]

[email protected]: CA, shipping

NY, billing

Entity type: Individual- Name: String- Email: List- Address: Complex

- State: String- Type: List

©  2015.  All  Rights  Reserved.     21

Entity

doc1… <Email>.1 <Email>.2 …

[email protected] [email protected]

Multi-­value  metadata-­driven attributes  in  Cassandra  (Thrift  API)

Metadata

Metadata-­driven  Documents  in  Columnar  storage

Page 22: Reltio: Powering Enterprise Data-driven Applications with Cassandra

ID: doc1Type: IndividualName: JohnEmail: [email protected]

[email protected]: CA, shipping (1)

NY, billing (2)

©  2015.  All  Rights  Reserved.     22

Entity

doc1… <Address>.1.<State>.1 <Address>.1.<Type>.1 <Address>.2.<State>.1 …

… CA billing NY

Complex  metadata-­driven  attributes  in  Cassandra  (Thrift  API)

Metadata

Metadata-­driven  Documents  in  Columnar  storage

Entity type: Individual- Name: String- Email: List- Address: Complex

- State: String- Type: List

Page 23: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     23

Metadata-­driven  Documents  – CQL  wide  rowsCREATE TABLE ENTITIES(

doc_id int,attribute_name String,attribute_value String,…PRIMARY KEY (doc_id, attribute_name)

);

SELECT * -- select all addressesFROM ENTITIESWHERE doc_id = 1AND attribute_name >= Address.0 AND attribute_name <= Address.9;

Page 24: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     24

John

DunderMifflin

Dwight

CopyPaper

Employee Individual

ProductOrganization Cassandra-­ Records  storage  across  datacenters

Reltio-­ Metadata-­driven  graphs-­ Rich  model  for  entities,  relations-­ Partitioning-­ Effective  joins-­ Graph  operations

Hybrid  Graphs  -­ linked  entities  with  infinite  attribution

Page 25: Reltio: Powering Enterprise Data-driven Applications with Cassandra

25

Reltio  de-­duplication

John Smith

Jon Smith

Page 26: Reltio: Powering Enterprise Data-driven Applications with Cassandra

©  2015.  All  Rights  Reserved.     26

Cassandra+ = Hybrid searchElasticsearch**  excluded  documents

Hybrid  Search  – without  documents!

0

0.5

1

1.5

Data  volume  in  Elasticsearch index  (Tb)

0

1000

2000

Elasticsearch indexing  performance  (OPS)

0

10

20

30

Search  performance  on  large  documents  (sec)

-­ Elasticsearch

-­ Hybrid  search:  Elasticsearch +  Cassandra

Page 27: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Reltio  Cloud  Data  Components

©  2015.  All  Rights  Reserved.    

Spark

AWS

AWS  Redshift

Cassandra

Elasticsearch

Page 28: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Reltio  Use  Cases

©  2015.  All  Rights  Reserved.     28

AManag

Page 29: Reltio: Powering Enterprise Data-driven Applications with Cassandra

Thank  you