hadoop desktop cluster

4

Click here to load reader

Upload: pjmorse

Post on 12-Jul-2015

820 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Hadoop Desktop Cluster

Paul Morse | Hadoop, Hadoop Desktop Cluster, CriKit, Copyright 2012 – All Rights

Reserved | September 30, 2012

Hadoop Desktop Cluster TIME FOR HADOOP ON YOUR DESK !

Page 2: Hadoop Desktop Cluster

Hadoop Desktop Cluster - CriKit

COPYRIGHT 2012 – PAUL MORSE – ALL RIGHTS RESERVED PAGE 1

1

Hadoop is Hot

The number of organizations that want to investigate and use Hadoop and other “Big

Data” solutions is growing rapidly. It has been claimed that 50% of the world’s data will be

held in Hadoop by 2015. That seems like an aggressive prediction, but there is no doubt

that the entire sector of “Big Data” is exploding with no end to the expansion in the near

term. One of the major inhibitors to broad adoption can be the high cost of hardware to

test and evaluate these new technologies. What is needed is a low cost computing

environment with which to test these new technologies to determine if they are viable

solutions for the organization. There are, of course, many options from public cloud

providers and SaaS providers that have built competent solutions around Hadoop and

other Big Data technologies, but many organizations are reluctant to put their private data

in public environments to simply try out these new solutions. Further, it seems all the

major hardware vendors are standing by to offer solutions in the hundreds of thousands of

dollars coupled with services offerings that rival or exceed the hardware costs – just to

take a look at the technology.

What is needed for a lot of organizations with a tight budget is a compact, low-wattage,

multi-node sandbox to test the basic functionality to see if a larger scale environment is

indicated for further pursuing Hadoop solutions. Test small, go large if it works for you.

Enter CriKit

One solution for testing big data solutions is CriKit, Desktop Private Cloud. Originally

created for testing and running Private Cloud software from a variety of companies, it is

perfect for budget-conscious organizations that want to test the functionality of Hadoop

or Cassandra and many of the functional add-on products like DataMeer or Tableau or

myriad others.

Why Crikit?

CriKit is a unique desktop cluster solution. It was designed to be at the confluence of

compact size, low-wattage, but high compute power, with future reusability in mind. By

being broken into discrete compute nodes and not put in a proprietary case, the

MicroServers can be reassigned as desktop devices when they have outlived their

usefulness as servers. This extends the useful life of the MicroServers by 5- 7 years. A dual

hard drive CriKit MicroServer costs roughly $1,500.00 USD. If it has a multi-role useful life

of 10 years, that equates to approximately 41 Cents a day per node, or less than $2.00 a day

for the 4 compute nodes in the base CriKit system. This cost compares very favorably

against larger, more costly on-premise solutions and Public Cloud offerings from a variety

of cloud vendors. Further, Crikit can be used in a hybrid cloud configuration where

Hadoop experts can refine their Hadoop environment to be the most efficient locally, then

Page 3: Hadoop Desktop Cluster

Hadoop Desktop Cluster - CriKit

COPYRIGHT 2012 – PAUL MORSE – ALL RIGHTS RESERVED PAGE 2

2

burst to Public Clouds for large scale processing. This helps ensure that organizations

minimize their Public Cloud Hadoop computing spend, where simple mistakes can be

very costly.

What is in a CriKit?

CriKit was designed to be very simple and provide the compute power, networking and

management necessary to build private clouds or data clusters. CriKit MicroServers can

easily be added to increase the processing capability of the cluster, and the financial step

function of adding MicroServers is small compared with larger, proprietary offerings. Each

desktop CriKit environment for a minimal, 4 node Hadoop cluster contains -

Computing Nodes - CriKit contains 4 energy-efficient compute nodes that include an

Intel Server Motherboard, a 64 Bit Intel Xeon Server CPU, 16 GB of RAM, Dual 1 Gb

Ethernet Network Interface Controllers ( NIC’s) and varying sizes and types of SATA III,

2.5 inch drives. CriKit nodes can contain up to 2 SATA III, 2.5 inch spinning, Hybrid or

Solid State Drives. Testing has shown that SSD’s provide the best read performance by a

wide margin.

1 Gb Ethernet Switch - CriKit comes standard with an 8 Port, unmanaged, 1 Gb Ethernet

switch. Managed switches, and switches with more ports for larger CriKit

implementations are available.

Keyboard, Video and Mouse Switch - A high-quality, 8 node DVI/USB switch is

included with CriKit. This switch can be daisy-chained with additional switches to

accommodate 511 CriKit compute nodes and one Management Workstation.

Management/Development Workstation - CriKit comes with a high-powered

workstation to manage the cloud environment and provide high developer productivity.

Purchasers can select the components of the workstation – like CPU, Disk, Memory, etc -

or decide not to buy the workstation component and use their own desktop or laptop

machine as a cluster management station.

Architecture

A 4 node CriKit cluster provides the minimum hardware necessary to run a Hadoop

cluster. For testing and evaluation, 1 of the 4 nodes can contain all the Hadoop-related

management functions and the 3 remaining nodes can be the compute or slave nodes.

With 4 cores and 8 threads in each CPU, plus 16 GB of RAM and SSD’s up to 1.2 TB in each

node, there is enough compute horsepower, memory and high-speed storage to

adequately test Hadoop with moderate amounts of data.

Page 4: Hadoop Desktop Cluster

Hadoop Desktop Cluster - CriKit

COPYRIGHT 2012 – PAUL MORSE – ALL RIGHTS RESERVED PAGE 3

3

Further, if you want to use CriKit as a 4 node private cloud platform and run Hadoop in

virtual machines to test Hadoop scalability over many virtual machine nodes, this is a

viable configuration option as well.

Summary

There are a growing number of organizations that want to test and evaluate Hadoop and

other “Big Data” solutions and add-on products. CriKit provides a low-cost, low-wattage,

compact and quiet desktop computing platform that is ideal for organizations on a tight

budget.

http://www.crikit.info

http://www.cloudademia.com

http://www.usmicro.com