meeting application performance needs: scaling up versus scaling out
TRANSCRIPT
Meeting app performance needs – scaling up/scaling out 1
Do you know how to eat an elephant? 2
¨ One bite at a time!
¨ Divide and Conquer.
A practical problem 3
¨ Coca Cola needs to analyze consumer sentiment on Diet Coke brand across popular social networks ¤ What type of machine would they need? ¤ Will all the data even fit on the biggest most expensive
machine you can buy today?
The Need for Speed 4
¨ High Performance Architectures need more and more resources as demand grows
¨ Methods of adding more resources for a particular application fall into two categories: ¤ Scale up (vertical) VERSUS Scale Out (horizontal) ¤ Get a bigger machine VERSUS add more small
machines
Scale Up (scale vertically) 5
¨ Get a bigger machine ¨ Add resources to a single node in a system
¤ involving the addition of CPUs or memory to a single computer. ¨ Vertical scaling of existing systems
¤ enables effective virtualization ¤ provides more resources for the hosted set of operating
system and application modules to share. ¨ Taking advantage of such resources in a single computer can also be called
"scaling up“ ¤ such as expanding the number of Apache daemon processes currently running.
Scale Out (scale horizontally) 6
¨ Add more nodes to a collection of machines ¤ such as adding a new computer to a distributed software application. ¤ An example might be scaling out from one Web server system to three.
¨ Large number of low cost "commodity" systems ¤ As computer prices drop and ¤ performance continues to increase
¨ Several (Hundreds or thousands) of small computers configured in a cluster to obtain aggregate computing power that often exceeds that of single traditional RISC processor based scientific computers
¨ Scaling out fueled by availability of high performance interconnects (e.g., Myrinet and InfiniBand)
Trade offs 7
¨ Larger numbers of computers means ¤ increased management complexity, ¤ more complex programming model ¤ throughput and latency between nodes ¤ some applications do not lend themselves to a distributed
computing model ¨ Configuring an existing idle system has always been less
expensive than buying, installing, and configuring a new one, regardless of the model.
Scale Up versus Scale Out 8
Choosing between Scale up/Scale Out
¨ Scale up: ¤ You have a hard limit ¤ the size of the machine
on which you are running
¨ Scale out: ¤ Not limited to the
capacity of a single unit ¤ Combine the power of
multiple machines into a single pool
9
Scale Up Scale Out
Scale Up versus Scale Out 10
¨ In Concept: ¤ In both cases we break a sequential piece of logic
into smaller pieces that can be executed in parallel.
¨ In Practice: ¤ Two models are fairly different from an
implementation and performance perspective.
Scale Up versus Scale Out
¨ Concurrent programming on multi-core machines is often done through multi-threading and in-process message passing.
¨ Single large multi-core machines are best utilized in a context of a single application through concurrent programming
¨ Distributed programming does something similar by distributing jobs across machines over the network
¨ Patterns used are: ¤ MapReduce – Google (2004)
¤ Master/Worker
¤ Tuple Spaces
¤ BlackBoard
11
Scale Up Scale Out
Scale Up versus Scale Out
¨ Existence of a shared address space
¨ Data sharing and message passing can be done simply by passing a reference.
¨ Lack of a shared address space
¨ Makes sharing, passing or updating data significantly more complex
¨ Deal with passing of copies of the data which involves additional network and serialization and de-serialization overhead
¨ Once you cross the boundaries of a single process you need to deal with partial failure and consistency
12
Scale Up Scale Out
Why Scale Out 13
¨ Cost/Performance Flexibility: ¤ Optimize cost/performance by selecting the
optimal configuration setup at any time ¤ If your system is designed for scale-up only, then
you are pretty much locked into a certain minimum price driven by the hardware that you are using.
¤ In a competitive situation, the lack of flexibility could actually kill your business
Why Scale Out 14
¨ Continuous Availability/Redundancy: ¤ Failure is inevitable. ¤ One big system is a single point of failure ¤ The recovery process could be long ¤ Extended down-time needed to restore one big
machine
Why Scale Out 15
¨ Continuous Upgrades: ¤ Building an application as one big unit makes it
harder or even impossible to add or change pieces of code individually without bringing the entire system down.
¤ Better to decouple your application into concrete sets of services that can be maintained independently.
Why Scale Out 16
¨ Geographical Distribution: ¤ There are cases where an application needs to be
spread across data centers or geographical location to handle disaster recovery scenarios or to reduce geographical latency.
¤ Its better to distribute your application so putting in a single box won’t work.
Scaling out is non trivial 17
¨ Scale out apps need a rewrite as the programming model is different
¨ Scale out gains are not linear ¤ have to deal with network overhead, transactions, and
replication into operations that were previously done just by passing object references
¨ Beyond a few obvious cases, choosing between scale up and scale out is fairly hard
Further reading 18
¨ MapReduce: Simplified Data Processing on Large Clusters: Dean, Jeff and Ghemawat, Sanjay. ¤ http://research.google.com/archive/mapreduce.html
¤ Open Source Implementation of MapReduce ¤ http://hadoop.apache.org/