DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020
Framework Programme of the European Union
Monitoring in Big Data Frameworks
Gabriel IuhaszInstitute e-Austria Timisoara26 November 2015
Overview
o Introductiono Cloud Computing and Big Datao Monitoring Toolso Monitoring Requirements and Solutionso Conclusions
Introductiono Big Data in Cloud computing
o Volume, Velocity, Variety and Veracityo Cost Reduction, Rapid provisioning/time to market,
Flexibility/scalabilityo DevOps and Cloud
o Development and Operationso Communication, Collaboration, Integration,
Automationo DevOps Monitoring
o Measurement is a key aspect of DevOps
Big Data in Cloud Computing
o Challenges of Big Data On Cloudo Low Latency real-time data
oVirtualization overheadoMulti-tenancy overhead
o Scalabilityo Lack of RDBMS support
o Availabilityo Data integrity/privacy
Hadoop Ecosystem
Cloudera
HortonWorks
Monitoring Architectureo Cross layer monitoring of big data platformso Types of metrics are highly dependent on the type of the
application o Have to be decided on a platform/application basis
o Centralized Monitoringo All resource states are sent to a centralized monitoring servero Metrics are continuously polled from monitored components o Single point of failureo Lacks scalability
o Decentralized Monitoringo No single point of failureo Central authority is diffused
Toolso Hadoop Performance Monitoring UI
o Lightweight monitoring UI for Hadoop servero Uses Hadoop metrics (using Sinks)
o SequenceIQo Based on ELK stack and Docker containerso ElasticSearch can be easily scaled horizontallyo Logstash server on client side
o Gangliao Scalable distributed monitoring systemo Low per-node overheado Focused on System Metricso Gmond, gmetad and Web Front-end
Tools IIo Apache Chukwa
o Built on top of HDFSo Easily scalableo Potentially high overhead
o Hadoop Vaidyao Rule Based diagnostic tool for M/R jobso Performes post run results analysis
o Nagioso Plugin based architectureo Uses a centralized server to collect metricso Possible to create a hierarchical deployment
Requirementso Difficulties in cloud monitoring
o Scaleo Velocity or Timelinesso Constant changes
o The need for scalability and automationo Easy re-configurabilityo Lightweight metrics collectorso Identifying pertinent metrics
DICE Overview
Platform-Indep. Model
Domain Models
ContinuousValidation
ContinuousMonitoring
DataAwareness
ArchitectureModel
Platform-Specific Model
PlatformDescription
DICE MARTE
Deployment &Continuous Integration
DICE IDE
Big Data
QAModels
DICE Monitoring Platformo RESTful Web Service
o Used to deploy and configure all core/auxiliary componentso Used to query ElasticSearch
Exports metrics in: JSON, CSV, OSLC Perf. Mon 2.0 (RDF+XML)o Used for auto-scaling of monitoring solution
o ELK Stack o Extremely flexible/configurableo Horizontally scalableo Can except various input and output formatso ETL via Logstash server (filters) o Logstash-forwarder secure transmission (new Beats Data Shippers)o Visualization using Kibana4
o Collectd o Statistics collection daemono A lot of plugins available o Simple configuration
DICE Monitoring Platform II
DICE Monitoring Platform Scaled
DICE Monitoring Platform Variant
Conclusionso We have given a short overview of current monitoring
platforms Identified key requirements for Big Data Monitoringo Scaling, Autonomy, Timeliness o Automation via Chef recipes
o Presented the current Architecture of the DICE Monitoring Platformo Currently collecting from: HDFS, YARN, Spark, Storm, Kafkao In the near future: Cassandra possibly Trident
o Creating the full lambda architecture based anomaly detection platform o ElasticSearch used as serving layer
Thank You!
Questions?