big data applications on cloudera hadoop

8
Big Data consulting Big applications on Cloudera Hadoop

Upload: robert-gibbon

Post on 16-Jul-2015

137 views

Category:

Internet


4 download

TRANSCRIPT

Big Data consulting

Big applications on Cloudera Hadoop

Hadoop as an application framework■ Hadoop makes a great solution for web scale data

warehousing and batch data processing■ But there’s more to hadoop than big analytics queries

alone■ Cloudera Hadoop offers a compelling foundation for

building and deploying large scale distributed internet applications

Anatomy of a Hadoop-backed internet application

HBase database service

SolrCloud search service

Spark batch processing tierCache serviceWeb UI service API service

HDFS filesystem

Hadoop as an operating system■ Hadoop offers many of the foundation components you need to

build web scale applications:■ Message queues [Kafka]■ Stream processing [Spark]■ Batch processing [Spark, MapReduce]■ Database [Hbase]■ Search [SolrCloud]■ Storage [HDFS]

Cloudera Manager integration■ You can use Cloudera Manager to deploy, operate, monitor

and alert on your service’s custom components■ Stuff like Memcached, your APIs, and your web UI■ https://github.com/cloudera/cm_ext/wiki■ You package your custom services components as parcels

for distribution across the cluster■ A robust framework for packaging, versioning and upgrades

Hadoop cluster

Component view

Web 1 Web 2

SolrCloud 1

Hbase 1 Hbase 2 Hbase N SolrCloud N

Spark 1 Spark 2 Spark N Hadoop Master / CM

Network Services

Hadoop Master / CM

Firewall ServicesMemcached 1 Memcached N

Distribution Server

Deployment considerations■ You will still need the support of foundation network services like DNS, NTP and

firewall■ You may still need to deploy HAProxy - can be on nodes in the hadoop cluster with

floating IP■ http://blog.cloudera.com/blog/2013/08/how-to-achieve-higher-availability-for-

hue/■ Use Linux Control Groups (CGroups) to guarantee resource shares - configure from CM

■ http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_mc_cgroups.html

Wrap up■ By extending Cloudera Manager, Hadoop can be used to

build, deploy and operate complete, web-scale applications in a consistent and predictable way

■ Hadoop can offer much more than data warehousing alone

■ But still a little way to go until Hadoop becomes a fully fledged Data Centre scale OS