introduction to polyglot persistence
TRANSCRIPT
![Page 1: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/1.jpg)
Introduction to Polyglot Persistence
Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace
FOSSCOMM 2016
![Page 2: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/2.jpg)
Background
- 14 years in databases and system engineering
- NoSQL DBA @ ObjectRocket by Rackspace
- Passionate about MongoDB & Cassandra
![Page 3: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/3.jpg)
What is Polyglot Persistence?
A set of applications that use several core database technologies
ApplicationLayer
Key/Value Store
Column-family
Database
Relational
Database
![Page 4: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/4.jpg)
What is Polyglot Persistence?
Using the right tool for the right use case
![Page 5: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/5.jpg)
Let there be RDBMS“Monoglot” was (and still is) fine for simple application (one type of workload)
But… applications become complex
A simple E-commerce platform must have:- Session data (Add to Basket)- Search Engine (Search for products)- Recommendation engine (Customers Who Bought This Item Also Bought)- Payment platform - Geo Location service
![Page 6: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/6.jpg)
Applications growing rapidly
![Page 7: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/7.jpg)
A RDBMS taleIn the good old days “monoglot” = RDBMS
Once upon a time there was an RDBMS with performance issues- Vertical scaling - Secondary indexes- Partitioning- Denormalize- Read-Only Slaves- Shading - Start separating workloads
But… The more we scale the more features we miss
![Page 8: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/8.jpg)
The 3V EraVolume: Amount of dataVelocity: Speed of data processingVariety: Number of types of data
New databases introduced for Big Data
General-purpose DB is no longer on-trend
Per use-case Datastores becoming more popular
The raise of Polyglot Persistence
![Page 9: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/9.jpg)
CAP theorem
![Page 10: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/10.jpg)
Picking the right tools – Data Structure
- Relational databases (Oracle, MySQL)
- Key-value stores (Redis, Riak)
- Column Family stores (Cassandra, Hbase)
- Document databases (MongoDB, CouchDB)
- Graph databases (Neo4J)
![Page 11: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/11.jpg)
Relational Databases
Based on the relational model (Codd)Data are organized in tables (Row and columns)Using SQL (Structured Query Language)
![Page 12: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/12.jpg)
Relational Databases
Use when:- Your dataset is relational- Strong consistency needed- Access patterns are unknown
But … Doesn’t scale well horizontally
Use Cases:- Due to early adoption are everywhere- Payment systems
![Page 13: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/13.jpg)
Key/Value stores
A big hash map associative array- Very simple, One key <-> One value- Very fast read/write- No secondary indexes
{“Key”: (VRN)} => {value (car facts)}
[make#Ford {YYY0000} => model#Fiesta year#2010]
![Page 14: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/14.jpg)
Key/Value stores
Use when:- Operations are based on the key- Data is not highly related- Basic CRUD needed
But… Complex queries are painful
Use cases:- Session Data- User Profile/Preferences- Shopping cards
![Page 15: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/15.jpg)
Document Databases
Nested structures of keys and their values- Very flexible schema (JSON, XML)- One key One value but value is visible to queries- Supports hierarchical data- Supports secondary indexes
{“id”(VRN)} => {“document” (car facts)}
{ “make”: Ford”,{YYY0000} => “model”:”Fiesta”, “year”:2010 }
![Page 16: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/16.jpg)
Document Databases
Use when:- You don’t know much for the schema - Unstructured and Heterogeneous data
But … Joins and references are trickyDe-normalization requires more space
Use cases:- Product Catalog- CMS- Event logging from different sources
![Page 17: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/17.jpg)
Column family
In a table, data of the same column is stored together (K-V that V is K-V)- Data organized as columns- Great for sparse tables- Very fast column operation including aggregation
{“id”(VRN)} => {“column families” (car facts)}
{ “car”:{“make”: Ford”, “model”:”Fiesta” …},{YYY0000} => “parts”:{…}, “service”: {…} }
![Page 18: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/18.jpg)
Column family
Use when:- Big Data (Huge write volumes)- Versioning (Time-series data) But… Know your statements in advanceSchema design is not trivial
Use cases:- Time series data- Bidding platforms- Playlists
![Page 19: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/19.jpg)
Graph Databases
Inspired by the graph theoryNodes hold Data/Entities Each node connect to others using attribute(s)
![Page 20: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/20.jpg)
Graph Databases
Use when:- Highly interconnected data- Define explicit relationships and need traversal queries
But … Doesn’t scale well horizontally Use Cases:- Where 3rd degree (or higher) relationship needed - Social Media - Queries like friend of a friend
![Page 21: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/21.jpg)
What about Hadoop?
Framework for distributed storing and processing large sets of data
Use for:- ETL - Read raw data, apply filters create structured summary- Exploration engine / Data discovery- Data archive / Massive Storage
But…Do not use Hadoop as a database replacement
![Page 22: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/22.jpg)
Picking the right tools - Questions
Data structure:- Does it have a natural structure? Is unstructured?- How it is connected to each other?- How is it distributed?- How much data?
Access Patterns:- Read/Write ratio?- Uniform or random?- What is more important?
![Page 23: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/23.jpg)
Picking the right tools - Questions
Organization needs:- Do I need Authentication? What type?- Do I need Encryption?- Do I need Backups?- Do I need a Disaster Site?- What level of Monitoring?- Drivers, Languages?
Tools- Third Party tools- Add-ons/Plugins
![Page 24: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/24.jpg)
Picking the right tools - Questions
Maturity- How long is in the market?
Documentation - Books, tutorials- Training
Type of Support - Community - Commercial support
![Page 25: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/25.jpg)
Challenges
Define the architecture- Decide which datastore will use to store certain data - Wrong decision can lead to painful migration(s)
Deployment complexity- Provision different type of HW, OS , patches …- Backup/Restore - Control configuration changes- Monitor all different components
![Page 26: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/26.jpg)
Challenges
Application complexity- Different connection per datastore- Handle different type of errors- Map different results on the application layer- Keep Datastores in-sync (cross-database consistency)
- Active/Passive topology- Active/Active topology
Training for Devs and Ops- Develop new skills for your teams- Support for a period of time
![Page 27: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/27.jpg)
Don’ts and anti-patterns
Over engineering- Keep it simple- Remove pieces that don’t add value
Conform to stereotypes- Use cases are general guidelines - Benchmarks are indicators
Stay static- Try new technologies- Being on-trend but cautious
![Page 28: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/28.jpg)
Use Case: Mongo – ElasticSearch
MongoDB is using B-Tree indexes for everything
B-Tree is great but not excel for all use-cases
Start getting hard limits for certain use cases like FullText search
Lucene engine is better option for Full text search (using inverted indexes)
We already had the ElasticSearch to our portfolio so we just “connected” the two products
![Page 29: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/29.jpg)
Use Case: Mongo – Elastic Search
- Connector is using the Active/Passive model
- Writes must go through MongoDB and then propagated to ElasticSearch
- Flexible, user can pick what to propagate:- Database(s)- Collection(s)- Field mapping- Indexing/analyzing
![Page 30: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/30.jpg)
Use Case: Mongo – Elastic Search
- An initial sync is needed (bulk API)
- Connector is using a tailable cursor that reads MongoDB oplog and propagate the changes
- Similar to Extract, Transform, Load (ETL)
- Application is responsible to properly direct the read requests to the most suitable datastore
![Page 31: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/31.jpg)
Call me Polyglot , Call me Multi-Model
MongoDB (version 3 or higher)- MMAPv1- WiredTiger- In-Memory
Percona Server for MongoDB- MMAPv1- WiredTiger- RocksDB- PerconaFT
![Page 32: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/32.jpg)
The future(?) Multi Model databases
Support different models within the engine -OR-Offer different layers on top of the engine
OrientDB supporting graph, document and key/value modelsRelationships are managed as in graph databases with direct connections between records
FoundationDB feature layers on top of a key-value store
![Page 33: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/33.jpg)
Keep in touch
: I iamantonios [email protected]
We are hiring!!! Data Engineers, DevOps, DBAs and morehttp://objectrocket.com/careershttps://www.rackspace.com/talent/
![Page 34: Introduction to Polyglot Persistence](https://reader035.vdocument.in/reader035/viewer/2022081417/58a80a021a28ab3d6e8b560b/html5/thumbnails/34.jpg)
Questions?
Thank you!!!