Download - NoSQL - West Virginia Universityjharner/courses/stat623/docs/NoSQL.pdf · 2014-11-10 · Selecting a NoSQL Database • Key-value is used for session information, preferences, proﬁles,

NoSQL Databases

Not Only SQLMore than one way to store data

• Not using the relational model

• Performs well on clusters

• No fixed schema

• Generally, open source

• Specialized for websites and big data

Why NoSQL?• Developers frustrated impedance mismatch

between relational model and application model

• Most projects use Entity-Relationship Mapping tools with a steep learning curve

• Relational databases do not run well on clusters. It would be impossible to run large scale websites (Twitter, Facebook) using a RDBMS

Aggregate Data Models• Modern applications need data grouped into

units for ACID purposes

• Aggregate data is easy to manage over a cluster

• Inter-aggregate relationships must be handled via map-reduce computations

• Also offer pre-computed materialized views

Data Distribution

• Sharding

segements the data by primary key into shards, each stored on a different server

• Two models used for distributing data across a cluster

Data Distribution

• Replication

copies all the data to multiple servers. Two forms:

• Master-slave

• Peer-to-peer

• Two models used for distributing data across a cluster

CAP Theorem

Choose any two:

• Consistency

• Availability

• Partition Toleration

C A

P

CAP Theorem

Consistency

All database clients will read the same value for the same query, even given concurrent updates

C A

P

CAP Theorem

Availability

All database clients will always be able to read and write data

C A

P

CAP TheoremPartition Tolerance

The database can be split into multiple servers and will continue to function if then network connection is interrupted

C A

P

CAP is the Key• SQL databases have a fixed capability (the

standard) and you can’t choose one feature over another

• NoSQL databases allows choice of best features for a particular system

• NoSQL requires knowledge to make a good choice

3 Faces of CAPCA (Consistency and Availability)

• Uses 2-phase commit

• Blocks system, so only works well in a single data center

3 Faces of CAPCP (Consistency and Partition Tolerance)

• Uses shards to scale

• Data is consistent, but still run the risk of some data becoming unavailable if a node fails

3 Faces of CAPAP (Availability and Partition Tolerance)

• System always available

• May return inaccurate data

• Similar to DNS

NoSQL Database Types• Key-Value

• Document

• Column-family

• Graph

Key-Value Databases• Simplest API: get, put, delete

• Data is just a blob

• Might not be persistent

• Examples: riak, memcached, redis

Document Databases• Similar to key-value except the value is in a

known format and is examinable

• Data is structured (most likely JSON, BSON, or XML)

• Might not be persistent

• Examples: CouchDB, MongoDB

Column Family Databases• Store rows that have many columns associated

with a row key

• Column families are groups of related data that is often accessed together (a customer’s profile would be one family, their orders another)

• Examples: cassandra, HBase

Graph Databases• Stores entities and the relationships between

them

• Entities have properties

• Edges (relations) can have properties and have directional significance

• Allows easy traversal of relationships

• Examples: Neo4J, Infinite Graph

Why Choose NoSQL?• To improve programmer productivity

• To improve data access performance by handling larger data sets, reducing latency, and/or improving throughput

Selecting a NoSQL Database• Key-value is used for session information,

preferences, profiles, and shopping carts. Normally data without relationships.

• Document are for content mangement, real-time analytics, and ecommerce. Avoid when aggregate data is often needed.

• Column family databases are useful for write-heavy operations like logging.

Selecting a NoSQL Database• Graph databases are optimal for social

networks, spatial data, and recommendation engines

Download - NoSQL - West Virginia Universityjharner/courses/stat623/docs/NoSQL.pdf · 2014-11-10 · Selecting a NoSQL Database • Key-value is used for session information, preferences, proﬁles,

Top Related