bigger, better, faster, more
DESCRIPTION
Bigger, Better, Faster, More. An Introduction to Super-Scalability. But first…. The Arms Race. 1 ENIAC. 1 Teletype. 1 Mainframe. N Terminals. N Servers. N Terminals. N Servers. N PCs. N Web Servers. N Browsers. N Web Servers. N AJAX Apps. N Clusters. N AJAX Apps. N Clusters. - PowerPoint PPT PresentationTRANSCRIPT
Bigger, Better, Faster, More
An Introduction to Super-Scalability
THE ARMS RACEBut first…
1 ENIAC 1 Teletype
1 Mainframe N Terminals
N Servers N Terminals
N Servers N PCs
N Web Servers N Browsers
N Web Servers N AJAX Apps
N Clusters N AJAX Apps
N Clusters N*M Phones
N Cloudlets N*M Phones
And So On…
What is Scalability?
Scalability = Ability to do More
More What?
More Processing
Processing Takes Resources
Types of Resources
CPU Disk Memory Network
Types of Utilization
Time / Throughput
Space / Capacity
Types of Utilization
Time / Throughput
Space / Capacity
Complexity
Locking
Resources & Utilization
We Want More!(but how to scale?)
How to Scale
Just make it bigger (vertical scaling)
We Want Even More!(super-scalability)
Scaling Strategies
Space Bigger
Complexity Better
Time Faster
Locking More
Bigger (Space)
Not SuperOne big data storeOne big memory storeMake it biggerMake it redundantE.g. Full activity logging
PartitioningSharding / HashingGrowth = Add PartitionTradeoff: Splitting PartitionsTradeoff: Redundancy becomes a distribution problem
…CBA
Better (Complexity)
Not SuperNumber of objects increaseAs relations increase, add time or space requirementsCommon with graph problemsE.g. PageRank
DistributionChop up problem / workloadMap/ReduceTradeoff: coordinationTradeoff: network
Faster (Time)
Not SuperTune your codeTune your databaseTune your networkBetter hardware
OptimizationAs fast as possible
Can’t scale as fast as growthSpecialization – ONE thingCaching - Reduces work in trade for spaceTradeoff: spaceTradeoff: coordination
More (Locking)
Not SuperOne at a timeSerialized access
Parallelizing / EstimatingSeparate reads & writesNon-locking estimationReduce contentionTradeoff: spaceTradeoff: coordination
But Which Technique(s)?
It Depends!
All: Divide & Conquer
Partitions: Data & ProcessingShardingWorker Processes
Coordination: Distribution & OrderingQueues & ManagersSeparate Read/Write Access
What does this make the system look like?
SOME THEORYAnd now…
ACID: reliable transaction systems
Atomicity – all or nothingConsistency – always correctIsolation – changesets executed independentlyDurability – once committed, stays so
Really hard to scale in one big block (although SSDs + RAM helps!)
Maybe It’s Not so Important?(it depends)
BASE is easier
Basically AvailableSoft StateEventual Consistency
A node will either eventually get a change or retireWell…still need conflict resolution
BASE is NOT ACID (get it?)
Can we have a Balanced pH?
CAP Theorem
Choose TWO:ConsistencyAvailabilityPartition tolerance
Manager
Replica 1 Replica 2
Double Outage!Double Outage!
Client 1 Client 2
Designing a scalable system
It Depends!
Understand Your Scale Points
LogProfileTuneTestDivideComparePartitionNo, really, log a lot
Fallacies of Distributed Computing
1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.
SOME “SCALY” TOOLS
CQRS Pattern
Separate operations for:Command – perform an actionQuery – returns data about state
Promotes simpler programsAllows Command QueuesReduces locking
A Scaly Stack
• Applications
SaaS
• Storage• Identity• Runtime• Queue / BusPaaS
• Compute• Block Data• NetworkIaaS
Infrastructure as a ServiceComponent Example
Compute Amazon EC2Azure Web/Worker Roles
Storage Amazon S3Azure TableStore
Network Any CDN
Platform as a ServiceComponent Example
Database SQL AzurePostgresMySQL
NoSQL CassandraRedisBigTableMongoDB
Cache Memcache
Queue Azure Service Bus
Processing HadoopStorm
Application as a Service
Salesforce? (Also sort of a platform)
Whateva!
AN EXAMPLECassandra
Cassandra
A “scalable” key-value storeAutomatic partitioningAutomatic replicas
Cassandra Data Model
Cassandra Data Model
So All is Good, Right?
A RELATIONAL EXAMPLEWorse than SQL Tuning?
Our Database
Know your Access Patterns
Get user by user id Get item by item idGet all the items that a particular user likesGet all the users who like a particular item
Cassandra Model #1: Relational-y
Can’t get all the items that a particular user likes (without a giant scan)
Cassandra Model #2: Indexes
N-M relationship is modeled with two tables. But Properties require secondary lookups.
Cassandra Model #3: Denormalization
Can put some data in the indexes if your queries need it. (Or serialize data.)
Cassandra Model #4: SuperColumns?
SuperColumns let you store other dimensions of data. (eek?)
Cassandra Model #5: Time order
Composite (sorted) column keys let you do neat things like time-order the mapping.
IT DEPENDS!Roll your own model – see www.datastax.com for great data model articles
Conflict Resolution in Cassandra
Each Tuple has a TimestampLast change winsRequires clock synchronization(Working on other strategies)
THE FUTUREBut wait, there’s more….
N*M*Q Cloudlets N*M*Q Devices
The Internet of ThingsIt’s coming. Can your servers handle it?
Things are Getting Smarter
ArduinoNetduinoRaspberry Pi ($25)
Servers will do Server Things
Cross-thing sharingData storageAnalysis
How Will We Survive?
CommunicationNetwork EffectAnalytics
Cell Computing
Self-sufficient unit of scaleAll components required to operate a portion of workloadKnown performance characteristicsKnown cost to interact with other cells
THINK BIGHow big is your project?
Some Scale
50,000 doctors100 editors500GB of data
Does it matter?