big data applications on cloudera hadoop
TRANSCRIPT
Hadoop as an application framework■ Hadoop makes a great solution for web scale data
warehousing and batch data processing■ But there’s more to hadoop than big analytics queries
alone■ Cloudera Hadoop offers a compelling foundation for
building and deploying large scale distributed internet applications
Anatomy of a Hadoop-backed internet application
HBase database service
SolrCloud search service
Spark batch processing tierCache serviceWeb UI service API service
HDFS filesystem
Hadoop as an operating system■ Hadoop offers many of the foundation components you need to
build web scale applications:■ Message queues [Kafka]■ Stream processing [Spark]■ Batch processing [Spark, MapReduce]■ Database [Hbase]■ Search [SolrCloud]■ Storage [HDFS]
Cloudera Manager integration■ You can use Cloudera Manager to deploy, operate, monitor
and alert on your service’s custom components■ Stuff like Memcached, your APIs, and your web UI■ https://github.com/cloudera/cm_ext/wiki■ You package your custom services components as parcels
for distribution across the cluster■ A robust framework for packaging, versioning and upgrades
Hadoop cluster
Component view
Web 1 Web 2
SolrCloud 1
Hbase 1 Hbase 2 Hbase N SolrCloud N
Spark 1 Spark 2 Spark N Hadoop Master / CM
Network Services
Hadoop Master / CM
Firewall ServicesMemcached 1 Memcached N
Distribution Server
Deployment considerations■ You will still need the support of foundation network services like DNS, NTP and
firewall■ You may still need to deploy HAProxy - can be on nodes in the hadoop cluster with
floating IP■ http://blog.cloudera.com/blog/2013/08/how-to-achieve-higher-availability-for-
hue/■ Use Linux Control Groups (CGroups) to guarantee resource shares - configure from CM
■ http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_mc_cgroups.html
Wrap up■ By extending Cloudera Manager, Hadoop can be used to
build, deploy and operate complete, web-scale applications in a consistent and predictable way
■ Hadoop can offer much more than data warehousing alone
■ But still a little way to go until Hadoop becomes a fully fledged Data Centre scale OS