nosql emerging world of polygot persistence
DESCRIPTION
Nosql emerging world of polygot persistenceTRANSCRIPT
LOGO
LOGO
Emerging World of Polyglot Persistence
● Duc Nguyen● Vu Truong
By :
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGOContent
Key-Value Database1
Document Database2
Column-Family Stores3
Graph Databases4
Schema Migrations & Polygot Persistence5
Choosing Your Database6
Part II :
Imp
lem
en
t
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGOWhy NoSQL
The Value of Relational Database
Persistent Data
Concurrency
Integration
Standard Model
LOGOWhy NoSQL
Impedance Mismatch
LOGOWhy NoSQL
Attack of the Clusters
LOGOWhy NoSQL
Common characteristics of NoSQL :
● Not using the relational model
● Running well on clusters
● Open-source
● Built for the 21th century web estates
● Schemaless
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGO
● A collection of data that we interact with as a unit.
● Aggregates form the boundaries for ACID operationswith the database.
● Key-Value , documents and column-family databases can all be seen as forms of aggregate-oriented DB.
● Aggregates make it easier for the database to managedata storage over clusters.
● Aggregate-oriented databases work best when most data interaction is done.
Aggregate Data Models
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGOData Models
Complex Schema :
LOGOData Models
Graph Databases :
LOGOData Models
Schemaless Databases :
Schemaless databases allow you to freely add fields to records.But there is usually an implicit schema expected by users of the data
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGODistribution Models
Single Server :
LOGODistribution Models
Sharding :
LOGODistribution Models
Master-Slave Replication :
LOGODistribution Models
Peer-to-peer Replication :
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGOConsistency
Update Consistency :
Write-write conflicts occur when two clients try to write the same data at the same time.
Pessimistic approaches lock data records to prevent conflicts. Optimistic approaches detect conflicts and fix them.
LOGOConsistency
Read Consistency :
LOGOConsistency
Read Consistency :
LOGOConsistency
CAP Theorem :
LOGOConsistency
Some Consistency Model:
Strong consistency.
Weak consistency.
Eventually Consistent
LOGOConsistency
Eventually consistent:
Causal Consistency.
Read-You-Writes Consistency.
Session Consistency.
Monotonic Read Consistency.
Monotonic Write Consistency.
LOGOContent
Why NoSQL1
Aggregate Data Models2
Data Models3
Distribution Models4
Consistency5
Stamps & Map-Reduce6
Part I :
Un
der
stan
d
LOGOStamps & Map-Reduce
Stamps :
Version stamps help you detect concurrency conflicts
Version stamps can be implemented using counters, GUIDs , content hashes, timestamps, or a combination of these.
With distributed systems, a vector of version stamps allows you to detect when different nodes have conflicting updates
LOGOStamps & Map-Reduce
Map-Reduce : Basic
LOGOStamps & Map-Reduce
Map-Reduce : Partitioning and Combining
LOGOStamps & Map-Reduce
Map-Reduce : Partitioning and Combining
LOGOStamps & Map-Reduce
Map-Reduce : Partitioning and Combining
LOGOStamps & Map-Reduce
Map-Reduce : Partitioning and Combining
LOGOContent
Key-Value Database1
Document Database2
Column-Family Stores3
Graph Databases4
Schema Migrations & Polygot Persistence5
Choosing Your Database6
Part II :
Imp
lem
en
t
LOGOContent
Part II :
Imp
lem
en
t
LOGOContent
Key-Value Database1
Document Database2
Column-Family Stores3
Graph Databases4
Schema Migrations & Polygot Persistence5
Choosing Your Database6
Part II :
Imp
lem
en
t
LOGOKey-Value Databases
Comparison with Oracle :
LOGOKey-Value Databases
What Is a Key – Value Store :
The simplest NoSQL data stores to use From an API perspective.
Some of the popular key-value database :Riak, Memcached DB , Berkeley DB, Hamster DB , Amazon Dynamo DB ....
LOGOKey-Value Databases
Key-Value Store Features :
Consistency : applicable only for operations on a single key , operations are either a GET, PUT, or DELETE on a SINGLE KEY.
Transactions : Different products of the key-value store kind have differentspecifications of transactions.
Query Features : Only support query by the key.
Structure of Data : Don't care what is stored in the value part of thekey-value. The value can be a blob, text, JSON, XML ...
Scaling : many key-value stores scale by using sharding.
LOGOKey-Value Databases
Suitable Use Cases :
Storing Session Information : every web session is unique and is assigneda unique session value.This single-request operation makes it very fast.Solutions such as Memcached are used by many web applications.
User Profiles , Preferences : cause almost every user has a unique userId , username , or some other attribute .
Shopping Cart Data : for E-commerce websites...
LOGOKey-Value Databases
When Not to Use :
Relationships among Data :
Multioperation Transactions:
Query by Data:
Operations by Sets:
LOGOContent
Key-Value Database1
Document Database2
Column-Family Stores3
Graph Databases4
Schema Migrations & Polygot Persistence5
Choosing Your Database6
Part II :
Imp
lem
en
t
LOGODocument Databases
Comparison with Oracle :
LOGODocument Databases
What Is a Document Database :
LOGODocument Databases
Features :Consistency : using the replica sets and choosing to wait for the writesto be replicated to all the slaves or a given number of slavers.
Transactions : Transactions at the single-document level are known as atomic transactions. It's not possible with more than one operation.
Availability : Try to improve on available by replicating data using the master-slave setup. Providing high availability using replica sets.
LOGODocument Databases
Features :
Query Features : provide different query features. CouchDB allows you to query via view.
One of good features of document databases, as compared to key-value stores , is that we can query the data inside the document without having toretrieve the whole document by its key and introspect the document.
LOGODocument Databases
Features :Scaling : When a new nod is added , it will sync up with the existing nodes,join the replica set as secondary node , and start serving read request.
LOGODocument Databases
Suitable Use Cases :Event Logging : Application have different event logging needs; within the enterprise, these are many different applications that want to logevents. Documents DB can store all these different types of eventsand can act as a central data store for event storage.
Content Management Systems, Blogging Platforms: They work well incontent management systems or applications for publishing websites,managing user comments , user registrations, profiles,web-facing documents...
Web Analytics or Real-Time Analytics: store data for real-time analytics,since parts of the document can be updated.It's very easy to store pageviews or unique visitors.
E-Commerce Applications : often need to have flexible schema for products and orders, as well as the ability to evolve their data modelswithout expensive database refactoring or data migration.
LOGOContent
Key-Value Database1
Document Database2
Column-Family Stores3
Graph Databases4
Schema Migrations & Polygot Persistence5
Choosing Your Database6
Part II :
Imp
lem
en
t
LOGOColumn-Family Stores
LOGOContent
Key-Value Database1
Document Database2
Column-Family Stores3
Graph Databases4
Schema Migrations & Polygot Persistence5
Choosing Your Database6
Part II :
Imp
lem
en
t
LOGOGraph Databases
Common characteristics :
● What is a Graph Databases ?
● Features
● Suitable Use Cases
● When Not to Use
LOGOGraph Databases
What is Graph Databases :
LOGO
● Graph databases allow you to store entities andrelationships between these entities.
● We can query the graph in many ways.
● A query on the graph is also known as traversing thegraph.
● In graph databases, traversing the joins or relationshipsis very fast.
Graph Databases
What is Graph Databases :
LOGO
● Consistency.
● Transactions.
● Availability.
● Query Features.
● Scaling.
Graph Databases
Features :
LOGO
Neo4J:
Graph Databases
● We have to create relationship between the nodes in both directions
● Relationships are first-class citizens in graph databases
● Relationships don’t only have a type, a start node, and an end node, but can have properties of their own.
LOGO
Consistency:
Graph Databases
● Graph databases ensure consistency throughtransactions.
● When running Neo4J in a cluster, a write to the masteris eventually synchronized to the slaves.
● Slaves are always available for read.
● They do not allow dangling relationships.
LOGO
Transactions:
Graph Databases
● Neo4J is ACID-compliant.
● Before changing any nodes or adding any relationshipsto existing nodes, we have to start a transaction.
● Read operations can be done without initiating atransaction.
LOGO
Availability:
Graph Databases
● Neo4J achieves high availability by providing forreplicated slaves.
● These slaves can also handle writes.
● Neo4J uses the Apache ZooKeeper to keep track.
LOGO
Query Features:
Graph Databases
● Graph databases are supported by query languagessuch as Gremlin.
● Gremlin is a domain-specific language for traversingGraphs.
● Neo4J also has the Cypher query language for querying the graph.
● Neo4J allows you to query the graph for properties of the nodes, traverse the graph, or navigate the nodes
● Properties of a node can be indexed using the indexing service.
● Neo4J uses Lucene as its indexing service.
LOGO
Scaling:
Graph Databases
● With graph databases, sharding is difficult.
● The working set of nodes and relationships is heldentirely in memory.
● Adding more slaves with read-only access to the data
● Sharding the data from the application side usingdomain-specific knowledge.
LOGO
Suitable use cases:
Graph Databases
● Connected Data
● Routing, Dispatch, and Location-Based Services.
● Recommendation Engines.
LOGO
When not to use:
Graph Databases
● Problem when you want to update all or a subset ofentities.
LOGO
Đ ng l c phát tri n:ộ ự ể
XU H NG PHÁT TRI NƯỚ Ể
• Ngăn ng a nh ng s ph c t p không c n thi từ ữ ự ứ ạ ầ ế
• Tính ch u t i cao ( High Throughput ).ị ả
• Kh năng m r ng theo chi u ngang và ả ở ộ ề
• ch y đ c trên các ph n c ng thông th ng.ạ ượ ầ ứ ườ
• Tính ph c t p và chi phí đ thi t l p các c m c s ứ ạ ể ế ậ ụ ơ ở
d li u.ữ ệ
• Th a hi p gi đ tin c y và hi u su t caoỏ ệ ữ ộ ậ ệ ấ
LOGOXU H NG PHÁT TRI NƯỚ ỂĐ ng l c phát tri n:ộ ự ể
• Xóa b t duy v m t c s d li u có th gi i quy t ỏ ư ề ộ ơ ở ữ ệ ể ả ế
t t c các v n đ liên quan đ n l u tr d li u.ấ ả ấ ề ế ư ữ ữ ệ
• c m v m t s phân b đ n gi n và phân vùng Ướ ơ ề ộ ự ố ơ ả
c a các mô hình d li u t p trung.ủ ữ ệ ậ
• S phát tri n c a ngôn ng l p trình và các ự ể ủ ữ ậ
frameworks.
• Đáp ng yêu c u c a đi n toán đám mây.ứ ầ ủ ệ
LOGO
Phân lo i: (Theo lý thuy t CAP)ạ ế
XU H NG PHÁT TRI NƯỚ Ể
LOGON i dungộ
Gi i thi uớ ệ1
Xu h ng phát tri nướ ể2
Các nguyên lý ho t đ ngạ ộ3
H c s d li u MongoDBệ ơ ở ữ ệ4
T ng k tổ ế5
Tài li u tham kh oệ ả6
LOGOCác nguyên lý ho t đ ngạ ộLý thuy t CAP :ế
LOGOCác nguyên lý ho t đ ngạ ộS phân chia:ự
Memory Cached ( b nh Cache) .ộ ớ
Clustering ( Bó ) .
Separating Reads from Writes ( Tách bi t gi a vi c đ c và ghi) .ệ ữ ệ ọ
LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ
L u tr theo hàngư ữ
LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ
L u tr theo c tư ữ ộ
LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ
L u tr theo nhóm c tư ữ ộ
LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ
L u tr s d ng mô hình c u trúc cây h p nh tư ữ ử ụ ấ ợ ấ
LOGOCác nguyên lý ho t đ ngạ ộMô hình truy v n:ấ
Truy v n t ng t nh các h CSDL quan hấ ươ ự ư ệ ệ
LOGOCác nguyên lý ho t đ ngạ ộMô hình truy v n:ấ
Truy v n s d ng ph ng th c tán x t p h pấ ử ụ ươ ứ ạ ậ ợ
LOGOCác nguyên lý ho t đ ngạ ộMô hình truy v n:ấ
Truy v n s d ng cây B++ treeấ ử ụ
LOGOCác nguyên lý ho t đ ngạ ộĐánh giá hi u xu t truy v n :ệ ấ ấ
LOGON i dungộ
Gi i thi uớ ệ1
Xu h ng phát tri nướ ể2
Các nguyên lý ho t đ ngạ ộ3
H c s d li u MongoDBệ ơ ở ữ ệ4
T ng k tổ ế5
Tài li u tham kh oệ ả6
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆT o c s d li u (Collection).ạ ơ ở ữ ệdb.createCollection(<name> ,{<configuration parameters >})Đ nh nghĩa m t tài li u : dùng JSONị ộ ệ{title : " MongoDB " ,last_editor : "172.5.123.91" ,last_modified: new Date ( " 9/23/2010 " ) ,body : " MongoDB is a ..." ,categories : [" Database " , " NoSQL " , " Document Database ],reviewed : false}
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆThêm m t tài li u :ộ ệdb.<collection>.insert({ title:"MongoDB", last_editor: ... }) ;Truy xu t m t tài li u :ấ ộ ệdb.< collection >. find ( { categories : [ " NoSQL " , " Document Database" ] } ) ;C p nh t tài li u :ậ ậ ệ db.< collection >.save ( { ... } ) ;
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆC p nh t d li u :ậ ậ ữ ệdb.<collection>.update (<criteria>,<new document>,<upsert>,<multi >) ;Xóa tài li u:ệdb.< collection >.remove ( { < criteria > } ) ;T o ch m c:ạ ỉ ụ db.< collection >.ensureIndex ({ < field1 >: < sorting >,< field2 >:< sorting > , ...}) ;
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệINSERT v i SQL Server:ớ
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệINSERT v i SQL Server:ớ
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n đ n gi n v i SQL Server:ấ ơ ả ớ
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n đ n gi n v i SQL Server:ấ ơ ả ớ
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n ph c t p v i SQL Server:ấ ứ ạ ớ
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n ph c t p v i SQL Server:ấ ứ ạ ớ
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆCh y Demo so sánh INSERT v i ạ ớMySQL:(mili giây)
LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆCh y Demo so sánh INSERT v i ạ ớMySQL:(mili giây)
LOGON i dungộ
Gi i thi uớ ệ1
Xu h ng phát tri nướ ể2
Các nguyên lý ho t đ ngạ ộ3
H c s d li u MongoDBệ ơ ở ữ ệ4
T ng k tổ ế5
Tài li u tham kh oệ ả6
LOGOT ng k tổ ế
u đi m :Ư ể
+ Đáp ng đ c đ c hi u su t cao , ch u t i l n.ứ ượ ượ ệ ấ ị ả ớ
+ Kh năng m r ng theo chi u ngang.ả ở ộ ề
+ Ch y đ c trên nhi u ph n c ng ph thông.ạ ượ ề ầ ứ ổ
+ Đáp ng đ c nhu c u c a đi n toán đám mây.ứ ượ ầ ủ ệ
LOGOT ng k tổ ế
Nh c đi m:ượ ể
+ Đ i đa s đ u đang trong quá trình phát tri n .ạ ố ề ể
+ Đa s đ u là ph n m m ngu n m . Khó có th đ c ố ề ầ ề ồ ở ể ượ
ch p nh n trong các môi tr ng kinh doanh l n.ấ ậ ườ ớ
+ Không ràng bu c t c là không đ m b o đ c tính toànộ ứ ả ả ượ
v n c a d li u.ẹ ủ ữ ệ
+ Không đáp ng đ c nhu c u c a nhi u lo i ng ứ ượ ầ ủ ề ạ ứ
d ng.ụ
LOGOT ng k tổ ếNoSQL hay SQL:
Các chuyên gia khuyên r ng khi phát tri n ng d ng nhà s n xu t nên quan ằ ể ứ ụ ả ấtâm t i các NoSQL. Và ng d ng c a b n nên chuy n qua NoSQLớ ứ ụ ủ ạ ể khi th y th c s c n thi t.ấ ự ự ầ ế
LOGON i dungộ
Gi i thi uớ ệ1
Xu h ng phát tri nướ ể2
Các nguyên lý ho t đ ngạ ộ3
H c s d li u MongoDBệ ơ ở ữ ệ4
T ng k tổ ế5
Tài li u tham kh oệ ả6
LOGOTài li u tham kh oệ ả1. NoSQL resources: http://nosql-database.org/2. NoSQL wiki - http://en.wikipedia.org/wiki/NoSQL3. Scalability wiki -
http://en.wikipedia.org/wiki/Scalability#Scale_horizontally_.28scale_out.29
4. A Brief History of NoSQL - http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html
5. Nh t Quán Cu i Cùng - ấ ố http://www.sqlviet.com/blog/nhat-quan-cuoi-cung.
6. NoSQL Brief Guide to the Emerging World of Polygot Persistence. Martin Fowler.
LOGO
LOGO
www.themegallery.com
Thank You !