nosql, big data, and real time - qconsp...•pros • can be as fast or faster than traditional...

43
©2013 DataStax Confidential. Do not distribute without consent. Benjamin Coverston DSE Architect, DataStax Inc. NoSQL, Big Data, and Real Time 1 Monday, September 2, 13

Upload: others

Post on 27-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

©2013 DataStax Confidential. Do not distribute without consent.

Benjamin CoverstonDSE Architect, DataStax Inc.

NoSQL, Big Data, and Real Time

1Monday, September 2, 13

Page 2: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Who am I?• Ben Coverston• DSE Architect• DataStax since 2010• Previous Experience in the Travel Industry• Low Cost Airlines / Web Reservations• Past: HP / Accenture• Lived in Santa Catarina for a few years.

Monday, September 2, 13

Page 3: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

What is it?

NoSql

Monday, September 2, 13

Page 4: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Monday, September 2, 13

Page 5: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

What is NoSQL?

NoSQL is a term coined by Carlo Strozzi and repurposed by Eric Evans to refer to “some” storage systems. The NoSQL term should be used as in the Not-Only-SQL and not as No to SQL or Never SQL.

-- Alex Popescu

Monday, September 2, 13

Page 6: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

What is NoSQL (Cont.)• It’s not• No to SQL

• About performance• About scaling

• ACID• Eventual consistency

• Volume

• It is:• About choice

Monday, September 2, 13

Page 7: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Diversity in Data• Big Data has the 3 (or 4 or 5) V’s• Volume

• Variety• Velocity

• Variability (sometimes)• Value (other times)

Monday, September 2, 13

Page 8: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Diversity in Data• The V’s don’t cover everything• Availabilty is important

• Your use case is important too

Monday, September 2, 13

Page 9: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Is NoSQL Big Data?• You can store Big Data with an RDBMS• Is it easy?

• Is it cost effective?• What kid of compromises do you have to make?

Monday, September 2, 13

Page 10: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

The Problems• In general there are two classes of data problems• OLTP (Real-Time)

• Analytics (Batch)

• Usually you want both• No solution is perfect for everyone

• Popularity is no indication of fitness

Monday, September 2, 13

Page 11: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Use Cases• OLTP• Low Latency

• High Throughput• LOB Applications

• Batch• Predictive Models• Complex Queries

• Tomorrow (or precalculated, but now we need OLTP)

Monday, September 2, 13

Page 12: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Where to put your ‘Stuff’

Sharded RDBMS MPP -- Greenplum, Teradata Hadoop Key/ValueColumnarOther

Monday, September 2, 13

Page 13: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Why not just one?• Analytics• Optimize serial IO

• Limitations in Storage

• OLTP• Working Set

• Distribution• Availability

• Storage Medium

Monday, September 2, 13

Page 14: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Why Do We Need Something Else?• ACID semantics are often

overkill• ACID also makes the database

layer brittle• This means you get less

Availability (CAP Theorem)

Monday, September 2, 13

Page 15: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

The Application Stackwww.example.com

LB2LB1 LB3

ws3ws1 ws2 ws6ws4 ws5 ws9ws7 ws8

cache

1 2 3 4

DB#

Monday, September 2, 13

Page 16: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Sharding• Storage Limitations•Working Set• So just make more!

Monday, September 2, 13

Page 17: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

(“The&eBay&Architecture,”&Randy&Shoup&and&Dan&Pritche:)&

Monday, September 2, 13

Page 18: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

But Sharding• Is Painful• Requires ‘something else’•Most no-sql solutions auto-shard• Sharding requires tradeoffs.• Which means your application will need to change

Monday, September 2, 13

Page 19: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Monday, September 2, 13

Page 20: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Which should I choose?• Analytics• Hadoop (probably) if your data is big

• Spark, other (sometimes faster) solutions available now

•NoSql• Let’s talk!

Monday, September 2, 13

Page 21: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Decisions are about tradeoffs, never a zero-sum game

Fast, Cheap, Good -- Choose Two

Monday, September 2, 13

Page 22: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

CAP Theorem•More of two, less of one• Consistency

• Availability• Partitioning

• You have to accept P• That leaves C and A

Monday, September 2, 13

Page 23: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

How To Scale Anything• Partition By Function• Split Horizontally• Avoid Distributed Transactions• Avoid Synchronous Coupling• Virtualize Everywhere• Cache Everything

Monday, September 2, 13

Page 24: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Partition By Function• Don’t put everything in the same database• Physical• Pools of Machines

• Geographical Distribution• Automatic sharding (look for this)• Make sure it works!

• Virtual• Logical Tables, Schema

• Not 100% necessary, but schema is nice

Monday, September 2, 13

Page 25: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Partitioning (cont.)• Pros• Isolate failure• To a region

• To a service

• Simplify Failover

• Cons• Your DB has to handle multi-region replication• If you chose CP (CAP) you’re going to have a bad time• AP systems do OK here (Cassandra, actually excels)

• “Relational” part of databases becomes complex

• Everything gets denormalized

Monday, September 2, 13

Page 26: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Split Horizontally• Scaling Vertically is easy• To a point, then it gets expensive.. fast..

• Easy if your system has no state to maintain• Or if the states are known, and small• Sharding over dependent fields complicates design

• Some things distribute themselves easily• key/value stores

• Others not so much• BTree indexes, foreign keys

• P2P architecture is helpful when splitting• In other words, avoid masters

Monday, September 2, 13

Page 27: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Split Horizontally (cont)• Pros• Can be as fast or faster than traditional design

• Can scale up as long as you can afford more machines• Scaling is easy if you avoid having masters• Replication and failover don’t have to be special cases

• Cons• Even logical pieces of your app are distributed over many machines• example: your catalog is not all in one place

• Real time analytics is difficult, or slower

Monday, September 2, 13

Page 28: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Avoid Distributed Transactions•Have you tried this?•Hard to do right• Paxos gives us some hope• CAS in Cassandra 2.0 looks promising

• Even then, it’s not good for everything!

•MVCC works for many use cases• Compensating Mechanisms• Customer Service (Amazon, inventory)

Monday, September 2, 13

Page 29: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Avoid Distributed Transactions• Pros• Consistency in a distributed environment

• Cons• Slow• Overkill

• Did I say slow?• We chose CP so we get less A

• What happens when they don’t succeed?• Do we shut the whole thing down?

Monday, September 2, 13

Page 30: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Avoid Synchronous Coupling•What?• A or B can be down

• A can be down, B continues to work• B can suffer, while A continues to work

• If your recommendation engine fails, your customers can still buy stuff!• Master/Slave failover is a good example of synchronous coupling• Master is down, slave needs to take over, but in the meantime.. what happens?

Monday, September 2, 13

Page 31: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Avoid Syncronous Coupling• Pros• Fewer shared dependencies means less failure

• Less failure means more total uptime• For the whole

• Less coupling means that your application topology is more modular

• Introducing new, decoupled services is less risky

• Cons• More duplication of your infrastructure• e.g. now you have an application stack for each of your services.

Monday, September 2, 13

Page 32: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Avoid Synchronous Processing Flows• AKA• Blocking Sockets

• Serialized Processes• Locking in General

• Do what is important FIRST• Take their money• Modify Inventory

• Other less important stuff can be queued• Triggers• Joins

• Stored Procedures• Consistency Checks

Monday, September 2, 13

Page 33: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Avoid Synchronous Processing Flows• Pros• Critical operations will not block for nice to haves

• Easy monitoring of queues and assign priority to tasks• Problem areas are easier to identify

• Cons• Race conditions• More up front development cost

Monday, September 2, 13

Page 34: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

• DON’T• Pick your database because it has a sexy API

• Pick your database because it worked for somebody else

• DO• Pick a database that will fit with your use case

• Virtualize your data model• Encourage manipulation of your logical models

• DO NOT force interaction with your database

• Good virtualization means that you can change your data store later...• And most of your code will still work.

Virtualize Everything

Monday, September 2, 13

Page 35: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Virtualize Everything• Virtualization isn’t just for the programmer• Things fall apart• Requests have to be re-routed

• Parts Replaced• APIs change

• Good virtualization means you can make changes w/o impacting availability

Monday, September 2, 13

Page 36: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Cache Appropriately• You can’t cache everything• But you can cache stuff that doesn’t change• Or is expensive to retrieve

Monday, September 2, 13

Page 37: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

Cache Appropriately• Pros• Cache is fast (compared to traditoinal RDBMS access)

• Can give you a performance buffer

• Cons• Cache Coherence

• Cache Dependency• Is it a SPOF?

• What if it all doesn’t fit?

Monday, September 2, 13

Page 38: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

What about NoSQL?• All of this applies• Evaluate Products on their Strengths• If easy things are easy• The hard might be impossible

• Pick a something that makes the hard things possible

Monday, September 2, 13

Page 39: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

What are the ‘easy’ things?• Serialization Formats• JSON/BSON

• Data Models•HTTP/REST/JSON APIs•NodeJS Drivers!• etc.

Monday, September 2, 13

Page 40: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

The ‘hard’ things• Automatic Sharding• Where does the data go?

• How do I find it?• How do I add another?

•Multi DC• Replication• No SPOF

• Anti-Entropy• Continuous Availability• Upgrades

• Failure• Etc.

Monday, September 2, 13

Page 41: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

What should you use?• Your decision• Every database is not a fit for every problem.

Monday, September 2, 13

Page 42: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

DataStax Enterprise• DSE• Cassandra (OLTP)

• Analytics• Search

• The hard things are possible•We’re making the easy things easier

Monday, September 2, 13

Page 43: NoSQL, Big Data, and Real Time - QConSP...•Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy

©2013 DataStax Confidential. Do not distribute without consent. 43Monday, September 2, 13