big data in a public cloud

Big Data in a public cloud

Maximize RAM without paying for overhead CPU and other price/

performance tricks

Big Data

• Big Data: just another way of saying large data sets

• All of them are so large its difficult to manage with traditional tools

• Distributed computing is an approach to solve that problem

• First the data needs to be Mapped

• Then it can be analyzed – or Reduced

MapReduce

• Instead of talking about it, lets just see some code

Map/Reduce – distributed work

But that’s perfect for public cloud!

CPU / RAM

• Mapping – is not CPU intensive

• Reducing – is (usually) not CPU intensive

• Speed: load data in RAM or it hits the HDD and creates iowait

First point – a lot of RAM

Ah…now they have RAM instances

I wanted more RAM I said L

What about MapReduce PaaS

• Be aware of data lock in

• Be aware of forced tool set – limits your workflow

• …and I just don’t like it so much as a developer; it’s a control thing

There is another way

What we do

What is compute like – why commodity?

Now lets get to work

• Chef & Puppet – deploy and config

• Clone; scale up and down

• Problem solved?

And then there is reality…iowait

ioWAIT

• 1,000 iops on a high end enterprise HDD

• 500,000 iops on a high end SSD

• Public Cloud should be forbidden on HDD

Announcing: SSD priced as HDD

Who and How does workload?

• CERN – the LHC experiments to find the Higgs Boson

• 200 Petabytes per year

• 200,000 CPU days / day on hundreds of partner grids and clouds

• Montecarlo simulations (CPU intensive)

The CERN case

• Golden VM images with tools, from which they clone

• A set of coordinator servers, which scales the worker nodes up and down via provisioning API (clone, puppet config)

• Coordinator servers manages workload on each worker node doing Montecarlo simulations

• Federated cloud brokers such as Enstratus and Slipstream

• Self healing architecture – uptime of a worker node is not critical; just spin up a new worker node instance

The loyalty program case

• The customer runs various loyalty programs

• Plain vanilla Hadoop Map/Reduce:

• Chef and Puppet to deploy and config worker nodes

• Lots of RAM, little CPU

• Self healing architecture – uptime of a worker node is not critical; just spin up a new worker node instance

Find IaaS suppliers with

• Unbundled resources• Short billing cycles• New equipment: what’s your life cycle?• SSD as compute storage• Allows cross connects from your equipment in the

• CloudSigma, ElasticHost, ProfitBricks to mention a few

What we do

big data in a public cloud

cpu ram mapping

big data big data

new worker node instance

overhead cpu

config worker nodes

load data

data needs

ram instances

Technology

big data operational excellence ahead in the cloud · big...

itu-t sg13 cloud computing & big data activities ·...

the big cloud questions

cloud and big data standardisation - uazone.org ·...

big data on public cloud

cloud computing & big data

big data & cloud | public vs private cloud: a convenient...

big data @ cloud scale

the!cloud!begins with!coal big$data big$networks big ... ·...

big data & cloud | cloud storage simplified | adrian cole

universal cloud network and ecosystem · private cloud...

big data solutions in practice - project | lambda · big...

dell technologies cloud cloud without the chaos ·...

erp in the cloud - public sector...

big cloud fabric for nutanix enterprise cloudconnectivity...

deploying big data to the cloud: roadmap for · pdf...

big-iq and big-ip cloud edition

big-ip cloud edition managing compliance issues buying ... -...

oracle public cloud machine · oracle® public cloud...

1 optimizing quality-aware big data applications in the...