nosql distilled - wordpress.com · 2012. 10. 26. · views • sql - easy to construct views •...
TRANSCRIPT
NoSQL DistilledChapters 1-3
Thursday, 25 October, 12
Chapter 1
• Data lasts longer than programs
• Until now, SQL is where you store data.
• The question was only a matter of which DB to use.
• SQL: Can’t handle very large data volumes
Thursday, 25 October, 12
NoSQL
• What is NoSQL? It is poorly Defined
• Cassandra, MongoDB, etc
• Lost traditional consistency
• Gain easier development and scalability
• More recently - ‘not-only SQL’
• Authors claim: SQL is not going anywhere, but applications will use both SQL and NoSQL
Thursday, 25 October, 12
• Databases are more flexible at retrieving data than the file system
• They use transactions for managing concurrency
• One DB - many applications
• SQL skills highly reusable - translate across applications and vendors.
Thursday, 25 October, 12
Why SQL sucks
• SQL organizes data in tables and column (tuples and relations)
• Simple, but can not store rich object-oriented structures like nested classes or lists - “Aggregate objects”
• Object-oriented databases never took off
• ORM are successful, but come with their own problems
Thursday, 25 October, 12
• The internet changed the volume of data we need to store.
• Need very large computers or clusters
• Large computers are expensive and can only get so large
• SQL supports clustering & sharding
• but expensive, dingle-point of failure, complicated application coding.
Thursday, 25 October, 12
• Google BigTable & Amazon Dynamo
• Cassandra ‘CQL - Exactly like SQL, except where it is not’
• NoSQL - no formal definition.
• Comes from a meetup group.
• NoSQL DB’s have no schema
• Most NoSQL DB’s are open-source
Thursday, 25 October, 12
Chapter 2: Aggregate Data Models
• SQL - Relational Model - simple structure
• NoSql
• key-value*, document*, column-family*, graph
• * share a common characteristic of storing aggregate objects.
• aggregates are easy to shard and access from an application
Thursday, 25 October, 12
• Databases don’t have a way to define aggregates
• SM - A User aggregate involves the tables: User, UserEmails, UserAvatars, etc.
• Aggregates are bad if you want to summarize data that is in many aggregates
• Transaction scope - Aggregates vs tables
Thursday, 25 October, 12
• key-value: access aggregates by a key
• document store: queries&indexes on fields within an aggregate.
• column-family
• Azure Table Service
Thursday, 25 October, 12
Chapter 3
• Relationships between aggregates.
• Aggregate stores have diff ways of representing these relationships.
• Can only update 1 aggregate in a transaction. Updating several aggregates in one transaction is awkward.
• If you have lots of relationships use SQL...
Thursday, 25 October, 12
Graph DBs• or a Graph database
• different need than other NoSQL solutions.
• Need to easily model large sets of relationships - social networks, product preferences, RBAC?
Thursday, 25 October, 12
• FlockDB - Simple. Nodes & Edges.
• Neo4J - Schemaless Java objects can represent Nodes or Edges
• SQL- complex relationship queries expensive
• Graph DB’s - traversing relationships is cheap - cached at insertion time.
Thursday, 25 October, 12
No Schemas
• Don’t define upfront. Change anytime
• Good for non-uniform data
• Store exactly what you need - every time.
• Application code is the schema.
Thursday, 25 October, 12
Views
• SQL - Easy to construct views
• NoSQL - may need to load many aggregate objects for a single view
• No-SQL - can have materialized views
• Update view when updating an aggregate
• Jobs that update views at intervals
• Aggregates - need to think about how you are going to query the data.
Thursday, 25 October, 12