arm

3
Linux/Apache on ARM Processors In The Case for Low-Cost, Low-Power Servers, I made the argument that the right measures of server efficiency was work done per dollar and work done per joule. Purchasing servers on single dimensional metrics like performance or power or even cost alone, makes no sense at all. Single dimensional purchasing leads to micro-optimizations that push one dimension to the detriment of others. Blade servers have been one of my favorite examples of optimizing the wrong metric (Why Blade Servers aren’t the Answer to All Questions). Blades often trade increased cost to achieve server density. But density doesn’t improve work done per dollar nor does it produce better work done per joule. In fact, density often takes work done per joule in the wrong direction by driving higher power consumption due to the challenge of cooling higher power densities. There is no question that selling in high volume drives price reductions so client and embedded parts have the potential to be the best price/performing components. And, as focused as the server industry has been on power of late, the best work is still in the embedded systems world where a cell phone designer would sell their souls for a few more amp-hours if they could have it without extra size or extra-weight. Nobody focuses on power as much as embedded systems designers and many of the tricks arriving in the server world showed up years ago in embedded devices. A very common processor used in cell phone applications is the ARM. The ARM business is model is somewhat unusual in that they sells a processor design and then the design is taken and customized by many teams including Texas Instruments, Samsung, and Marvel. These processors find their way into cell phones, printers, networking gear, low-end Storage Area Networks, Network Attached Storage devices, and other embedded applications. The processors produce respectable performance and great price/performance and absolutely amazing power/performance. Could this processor architecture be used in server applications? The first and most obvious push back is that it’s a

Upload: prabir-k-das

Post on 03-Oct-2015

216 views

Category:

Documents


3 download

DESCRIPTION

Arm

TRANSCRIPT

Linux/Apache on ARM ProcessorsInThe Case for Low-Cost, Low-Power Servers, I made the argument that the right measures of server efficiency was work done per dollar and work done per joule. Purchasing servers on single dimensional metrics like performance or power or even cost alone, makes no sense at all. Single dimensional purchasing leads to micro-optimizations that push one dimension to the detriment of others. Blade servers have been one of my favorite examples of optimizing the wrong metric (Why Blade Servers arent the Answer to All Questions). Blades often trade increased cost to achieve server density. But density doesnt improve work done per dollar nor does it produce better work done per joule. In fact, density often takes work done per joule in the wrong direction by driving higher power consumption due to the challenge of cooling higher power densities.There is no question that selling in high volume drives price reductions so client and embedded parts have the potential to be the best price/performing components.And, as focused as the server industry has been on power of late, the best work is still in the embedded systems world where a cell phone designer would sell their souls for a few more amp-hours if they could have it without extra size or extra-weight.Nobody focuses on power as much as embedded systems designers and many of the tricks arriving in the server world showed up years ago in embedded devices.A very common processor used in cell phone applications is theARM. The ARM business is model is somewhat unusual in that they sells a processor design and then the design is taken and customized by many teams including Texas Instruments, Samsung, and Marvel. These processors find their way into cell phones, printers, networking gear, low-end Storage Area Networks, Network Attached Storage devices, and other embedded applications. The processors produce respectable performance and great price/performance and absolutely amazing power/performance.Could this processor architecture be used in server applications? The first and most obvious push back is that its a differentinstruction set architecturebut servers software stacks really are not that complex.If you can run Linux and Apache some web workloads can be hosted. There are many Linux ports to ARM -- the software will run. The next challenge, and this one is the hard one, does the workload partition into sufficiently fine slices to be hosted on servers built using low end processors. Memory size limitations are particularly hard to work around in that ARM designs have the entire system on the chip including the memory controller and none Ive seen address more than 2GB. But, for those workloads that do scale sufficiently finely, ARM can work.Ive been interested in seeing this done for a couple of years and have been watching ARM processors scale up for quite some time. Well, we now have an example. Check outhttp://www.linux-arm.org/Main/LinuxArmOrg. That web site is hosted on 7 servers, each running the following:Single 1.2GhzARMprocessor,Marvell MV781001 disk1.5 GB DDR2 with ECC!Debian LinuxNginxweb proxy/load balancerApacheweb serverNote that, unlikeIntel Atombased servers, this ARM-based solution has the full ECC memory support we want in server applications (actually you really want ECC in all applications from embedded through client to servers).Clearly this solution wont run many server workloads but its a step in the right direction. The problems I have had when scaling systems down to embedded processors have been dominated by two issues: 1) some workloads dont scale down to sufficiently small slices (what I like to call bad software but, as someone who spent much of his career working on database engines, I probably should know better), and 2) surrounding component and packaging overhead. Basically, as you scale down the processor expense, other server costs begin to dominate. For example, If you half the processor cost and also the throughput, its potentially a step backwards since all the other components in the server didnt also half in cost. So, in this example, you would get the throughput with something more than the cost. Generally not good. But, whats interesting are those cases where its non-linear in the other direction. Cut the cost to N% with throughput at M% where M is much more than N. As thesesystem on a chip(SOC) server solutions improve, this is going to be more common.Its not always a win based upon the discussion above but it is a win for some workloads today. And, if we can get multi-core versions of ARM, itll be a clear win for many more workloads. Actually, the Marvel MV78200 actually is a two core SOC but its not cache coherent which isnt a useful configuration in most server applications.The ARM is a clear win on work done per dollar and work done per joule for some workloads. If a 4-core, cache coherent version was available with a reasonable memory controller, we would have a very nice server processor with record breaking power consumption numbers. Thanks for the great work ARM and Marvel. Im looking forward to tracking this work closely and I love the direction its taking. Keep pushing.--jrhJames Hamiltone:[email protected]:http://www.mvdirona.comb:http://blog.mvdirona.com/http://perspectives.mvdirona.com