pa cloudera manager-api's_extensibility_v2
DESCRIPTION
TRANSCRIPT
1
Cloudera Manager – API’s & Extensibility
Patrick Angeles, Director Field Technical ServicesDecember 2013
CONFIDENTIAL - RESTRICTED
2
Cloudera ManagerEnd-to-End Administration for CDH
ManageEasily deploy, configure & optimize clusters1MonitorMaintain a central view of all activity2DiagnoseEasily identify and resolve issues3IntegrateUse Cloudera Manager with existing tools4
©2013 Cloudera, Inc. All Rights Reserved.
3
Integrating with your IT Mgmt tools
©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Installation, Deployment
toolse.g. Chef,
Puppet etc.
Monitoring Tools
e.g. Orion, Tivoli, BMC
etc.
Alerting Tools
e.g Nagios, SNMP etc.
Hadoop Operations
Datacenter OperationsVarious options of integrating Cloudera Manager into your existing Datacenter Operations/Tools• Cloudera Manager API
• Introduced in CM4 (June 2012)• Installation & deployment• Monitoring
• SNMP Alerts• Introduced in CM4.5 (Feb 2013)
• And more…• Monitoring ‘tsquery’ (Feb 2013)• User-defined triggers/alarms (new for C5!)• Service extensibility (new for C5!)
Cloudera Manager (CM) API• API access was a new feature introduced in Cloudera Manager 4.0, providing programmatic access to
cluster operations (such as configuration and restart) and monitoring information (such as health and metrics).
• The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuration. API users have the same privileges as they do in the web UI world.
©2013Cloudera, Inc. All Rights Reserved.4
• Docs & Exampleshttp://cloudera.github.io/cm_api/https://github.com/cloudera/cm_api
• Java/Python clientshttp://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/
©2013 Cloudera, Inc. All Rights Reserved.
Examples of integration with CM API• Installation & Deployment
• Chef• Puppet• Dell Crowbar
• http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-with-dell-crowbar-and-cloudera-manager/
• StackIQ• http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-with-Cloudera
• WANdisco – non-stop NN setup• Several other customers/partners leveraging the API’s as part of their install & deployment
process• Monitoring & Alerting
• Oracle Enterprise Manager (via Big Data Appliance)• Nagios
• https://github.com/cloudera/cm_api/tree/master/nagios• https://
github.com/harisekhon/nagios-plugins/blob/master/check_hadoop_cloudera_manager_metrics.pl• SNMP alerts integration with IBM Netcool
5
Develop & Contribute your plug-in’s using Cloudera Manager API
6
Cloudera Manager – Monitoring via ‘tsquery’
©2013 Cloudera, Inc. All Rights Reserved.
• Introduced as part of CM4.5 release (Feb 2013)
• Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters
• The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store
• Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service?select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1
• Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc.
• Extending this concept to create user-defined triggers/alarms (new for C5!).
• More details• http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Man
ager-Diagnostics-Guide/cm5dg_chart_time_series_data.html
7
Examples of Cloudera Manager ‘tsquery’
©2013 Cloudera, Inc. All Rights Reserved.
Example1: How do I track the aggregate Cluster Disk IO?select dt0(read_bytes_disk_sum), dt0(write_bytes_disk_sum) where category = CLUSTER and clusterId = $CLUSTERID
Example2: How do I compare CPU usage across hosts?select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100, dt0(total_cpu_system) / getHostFact(numCores, 1) * 100, dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100, dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100, dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100, dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100
Create & Contribute your ‘tsqueries’!https://github.com/cloudera/cm_charting_scrapbook
Cloudera Manager – Service Extensibility
• Introduced in C5• Still in Beta!
• Some aspects (espcially Parcel mgmt) available in CM4.x
• Example: Collaboration with Syncsort to deploy DMX-h libraries
• Single management console for CDH, non-CDH services and ISV applications
• Similar look and feel as existing services
• Easy to write (Java-free!)
• Flexible
• Independent release cycle
©2013Cloudera, Inc. All Rights Reserved.
9
Analogy from Operating Systems (OS) world
©2013Cloudera, Inc. All Rights Reserved.
Core OS kernel
PackageMgmt
Process/Resource
Mgmt
SecurityMgmt
Data AccessMgmt
ISV’s view of OS
Systems Management
10
Bringing ISV Apps to CDH
©2013Cloudera, Inc. All Rights Reserved.
Core Hadoop/CDH kernel
Parcels Resource Mgmt
SecurityMgmt CDK API’s
ISV’s view of Hadoop
Cloudera Manager
11
Integrating into the Cloudera Product Portfolio
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Features Description Examples
Package Mgmt
- Ability to easily package and distribute binaries/jars via “Parcels”
-Informatica-Syncsort
Resource Mgmt
- Ability to deploy applications as stand-alone processes or via YARN* on the Hadoop grid
- Resource isolation of cluster resources
-SAS-0xData-Accumulo
Security Mgmt
- Support for Kerberos Mgmt- Role bases access control for Tables/Views in
Hive/Impala via Sentry
Data Access Mgmt
- HDFS and HBase API abstraction and simplification
Systems Mgmt
Manage -Deploy and upgrade (rolling) services and pkgs-Manage configurations
Monitor -Proactive health checks-Track resource utilization -Custom metrics charts
Diagnose -Distributed log collection and searching-Tag and track key events
Integrate -Access operational tools via API-Surface overall cluster metrics to ISV dashboard
Non-CDH Apps…
ISV’s
Accumulo, Spark, Giraph etc.
* Support for YARN planned as part of CM5.x in FY14
So.. How does it work?
• A JSON file that describes of your service• Set of control scripts• Packaged as a JAR file• As promised, Java-free
©2013Cloudera, Inc. All Rights Reserved.
Example: Cloudera Manager Extensions - Spark
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.
#!/bin/bash
CMD=$1
MASTER_PORT=<read in from ./params.properties>
case $CMD in
(start_master)
exec $SPARK_HOME/scripts/spark-start.sh master"
;;
(*)
echo "$timestamp Don't understand [$CMD]"
;;
esac
name : “spark”,
roles : [{
name : "master",
startRunner : {
program : "scripts/control.sh",
args : [ "start_master",
"./params.properties"]
},
parameters : [{
name : "master_port",
type : "port",
default : 7077
}],
configWriter : {
generators : [{
filename : "params.properties"
}]
}]
The Code
©2013Cloudera, Inc. All Rights Reserved.
Next Steps
• Documentation & SDK as part of C5 Beta2 or later (definitely before GA!)
• Working with select ISV’s (SAS, Syncsort, 0xData etc.) as part of Beta to further fine-tune this feature
©2013Cloudera, Inc. All Rights Reserved.
Develop & Contribute your Cloudera Manager service extensibility plug-in’s !
©2012Cloudera, Inc. All Rights Reserved.
20
Vision of CM Extensibility
CDHCM
Syncsort Informatica
Security ISV’s
0xData
Capacity Mgr SLA Mgr Cost
Optimizer
API
Horizontal Extension
Ve
rtic
al
Ex
ten
sio
n
Se
rvic
e E
xte
ns
ibil
ity
Ops Apps
SAS
Revolution
Spark GiraphAccumulo
Oracle OEM DellNagios
APISNMP
Chef/Puppet
Q&A
©2013Cloudera, Inc. All Rights Reserved.