rocana deep dive oc big data meetup #19 sept 21st 2016
TRANSCRIPT
© Rocana, Inc. All Rights Reserved. | 2
Who am I • Field Engineer @ Rocana
• aka Sales Engineer • based in Orange County, CA
• Previously @ Cloudera
• Teach meditation & yoga
• Avid cyclist
• From Montreal, Canada
© Rocana, Inc. All Rights Reserved. | 3
Rocana at a glance
Founded and led by industry veterans • Innovators in big data and analytics • Published authorities in open source
Backed by world-class VCs • Google Ventures, General Catalyst, TOBA Capital
Recognized by industry influencers • Gartner Cool Vendor, Forbes Tech Council
Validated by customers and partners • Cloudera, Oracle, Confluent, ExtraHop, F100 users
Redefining operational visibility • Unmatched scale and analytics for all IT and customer events
Omer Trajman Co-Founder and CEO
Don Brown Co-Founder and CIO
Eric Sammer Co-Founder and CTO
© Rocana, Inc. All Rights Reserved. | 4
What we do • We build a system for the operation of modern data centers
• Triage and diagnostics, exploration, trends, advanced analytics of complex systems
• Our data: logs, metrics, human activity, anything that occurs in the data center
• Enterprise Software (i.e. we build for others.)
• Today: how we built what we built
© Rocana, Inc. All Rights Reserved. | 5
Typical customer use cases • >100K events / sec (8.6B events / day), sub-second end to end latency,
full fidelity retention, critical use cases
• Quality of service - “are credit card transactions happening fast enough?”
• Fraud detection - “detect, investigate, prosecute, and learn from fraud.”
• Forensic diagnostics - “what really caused the outage last Friday?”
• Security - “who’s doing what, where, when, why, and how, and is that ok?”
• User behavior - ”capture and correlate user behavior with system performance, then feed it to downstream systems in real time.”
© Rocana, Inc. All Rights Reserved. | 6
A new approach is needed
CURRENT STATE OF IT MONITORING • Data collected from few sources, often < TB/day • Retention measured in days or weeks • No analytics, just brute-force search
DESIRED STATE • Scale: PB’s of data from all sources, open & accessible • Duration: data retained online for months & years • Analytics: machine learning surfaces issues & opportunities
© Rocana, Inc. All Rights Reserved. | 7
Rocana Architecture
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 8
Rocana architecture: Data collection
Collection of all event data from every system across IT and the business
Collect metrics and events across Machine. Wire, Application, and Customer Data • Rocana Agent, JAVA Client, Syslog Collector, Log4j Appender, REST API, StatsD, Netflow
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 9
Rocana Agent
• Written in Go • Cross platform portability • Compiles down to Machine language • More performant than Java
• Host Metrics are captured • Disk I/O, CPU, Memory
• Configurations and Transformations
• Numerous Ingest Points • Syslog • File Tail • REST API • Spooling Directory • Windows Event Logs • Log4j • Java API • StatsD • Netflow
© Rocana, Inc. All Rights Reserved. | 10
Rocana architecture: Event data warehouse
Centralized warehouse for processing, analysis, and retaining event data
• Architecture built for real-time transformation, aggregation, and exploration of PB’s of events • Extensible data model provides a flexible way of representing time-oriented discrete events • Governance controls such as role-based access control (RBAC)
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 11
Rocana architecture: Exploration, visualization, analysis
Powerful embedded analytics and visualizations plus integration with 3rd parties
• Data visualization and exploration via Rocana Ops’ interactive user interface, or via 3rd party integration • First natively parallelized operational analytics system available
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 13
Rocana architecture: Enterprise event messaging system
Central nervous system facilitating real-time data delivery and processing
• Apache Kafka provides a high-throughput, highly robust, open messaging system
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 14
About Kafka
• Publish/Subscribe Messaging System From LinkedIn
• High throughput (100’s of k messages/sec)
• Low latency (sub-second to low seconds)
• Fault-tolerant (Replicated and Distributed)
• Supports Agnostic Messaging
• Standardizes format and delivery
© Rocana, Inc. All Rights Reserved. | 15
Rocana architecture: Transformation
Real-time transformation of data and indexing for immediate analysis
• Parse log messages, add fields, and filter in-flight • Index in real-time for immediate search/query capability
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 17
Rocana architecture: Data aggregation
Time series datasets of metric data from device and application activity
• Saved searches and combined datasets • Correlation of events • Alert on aggregated events
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 18
Metrics
• Metrics Consumer • Metrics storage • Host metrics (produced by agent) and Metrics parsed from Events ex: latency • Data Visualized RocanaOps standard and custom dashboards
• Metrics Consumer (aggregation) • Aggregates these events in 1-, 5-, 10-, and 60-minute intervals • Event volume, total value, and average value and other stats • Data is used by the Rocana analytics consumer and the web interface, for
anomaly detection and status visualizations
© Rocana, Inc. All Rights Reserved. | 19
Rocana architecture: Advanced analytics
Out-of-the-box machine-learning algorithms guide the analysis process
• Create models on every metric • Set baseline and alarms on anomalies (WARN scores), correlation, and alerting
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 20
Anomalies
• Advanced Analytics Consumer • Anomalous event detection • Piecewise Linear regression; Predictive intervals; Event Clustering • Specific locations, hosts, and services (or any combination of these sources) • Detection engine compares baseline models of expected activity to actual volumes -
the consumer looks for anomalies by comparing incoming event volume to the history for that model
• Example: • The analytics consumer evaluates event volumes for hosts comparing current volumes to the
baseline established from the records of several previous time periods. • When the analytics consumer detects event volumes outside the norm, they are submitted to
Kafka as events • Anomalies appear in the web interface as annotations
© Rocana, Inc. All Rights Reserved. | 21
Rocana architecture: Data lifecycle management
Policy management and governance for collected data
• Maintain datasets over time • Manage compliance, including data expiration policies • Optimize HDFS by combining small files into large files
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 22
Rocana architecture: Open event storage
It’s your data – Open data store provides you with complete access
• Data is persisted to HDFS for open long-term storage • Events are represented using the Apache Avro schema
DATA COLLECTION
syslog, log files, metrics
APIs
EVENT DATA WAREHOUSE
Transformation Aggregation
Advanced Analytics
Data Lifecycle
Event Data Bus
Event Storage
SOURCES
• Machine • Wire • Application • Customer Rocana
Visualizations
Rocana Search
3rd Party Integrations
DATA EXPLORATION & VISUALIZATION
© Rocana, Inc. All Rights Reserved. | 23
Rocana Search High cardinality, low latency, parallel search for time-oriented events
© Rocana, Inc. All Rights Reserved. | 24
Rocana Search: Limitless search and query
Problem: Monitoring approaches that limit online retention to a few days or weeks can’t support critical business use cases • E.g. YoY Black Friday comparisons • E.g. Advanced Persistent Threat detection
Solution: Rocana Search provides true online retention and search from the moment it is captured, and for as long as needed (years) • Efficiently and linearly scales to ingest millions of events per second • Query times hold constant regardless of data volumes • Dynamic sharding helps minimize downtime and maintenance overhead
© Rocana, Inc. All Rights Reserved. | 25
Rocana Search: High cardinality, low latency, parallel search for time-oriented events
• Parallel ingest and query, built for large clusters and high performance
• Purpose-built, time-oriented search architecture is 10x faster ingest than current state
• Query arbitrarily large amounts of indexed data on disk
• Performance is not impacted by volume or scale (dynamic sharding)
• Handles diverse dataset types, whether high or low cardinality
• Add/remove nodes dynamically without manual restarts or rebalances
Node Rocana Search
Node Rocana Search
Node Rocana Search
Node Rocana Search
Kafka
© Rocana, Inc. All Rights Reserved. | 26
Some key features of Rocana Search
• Fully parallelized ingest and query, built for large clusters
• Every node is an indexer
Hadoop Node
Rocana Search
Hadoop Node
Rocana Search Hadoop Node
Rocana Search
Hadoop Node
Rocana Search
Kafka
© Rocana, Inc. All Rights Reserved. | 27
Some key features of Rocana Search
• Every node is a query coordinator and executor
Query Client Rocana Search
Coord Exec
Rocana Search
Coord Exec
Rocana Search
Coord Exec
Rocana Search
Coord Exec
© Rocana, Inc. All Rights Reserved. | 28
What we’ve accomplished
In the context of search, scale means:
• High cardinality: Billions of unique events per day
• High speed ingest: Hundreds of thousands of events per second
• Not having to age data out of the dataset
• Handling large, concurrent queries, while ingesting data
• Fully utilizing modern hardware
© Rocana, Inc. All Rights Reserved. | 29
Key
Data Sources Collection and Transformation
Analysis Visualize Storage
Kafka Cluster Syslogd
Log4J
REST
ExtraHop
Custom
Sup
ervi
sor a
nd A
gent
NetFlow
Kafka Broker Kafka
Broker Kafka Broker
Event Stream
Processor
Kafka Broker
Rocana Search
HDFS Consumer
Analytics Consumer
Metadata Consumer
Metric Consumer
HD
FS
Custom Consumer
Impalad
Map Reduce
Spark
Rocana Ops
Other BI System
HUE
Graphana
3rd Party Hadoop Rocana Rocana Architecture – Hadoop Recap
© Rocana, Inc. All Rights Reserved. | 30
Why Apache Hadoop • Open source system
• Proven linear scalability
• Multiple execution engines • Allow for the right tool for the right job
• Leverage commodity hardware
© Rocana, Inc. All Rights Reserved. | 31
Why Apache Impala • Multi-User Performance & Usability
• Cost-based optimization allows for more users and tools to run a broader range of queries
• Compatibility • Provides both ANSI SQL and vendor-specific extensions • Compatibility with the leading BI partners
• Flexibility • Parquet provides best-of-breed columnar performance across Hadoop frameworks
• Native Integration • Unified with Hadoop resource management, metadata, security, and management
© Rocana, Inc. All Rights Reserved. | 32
Why Parquet • Columnar storage:
• Column-major instead of the traditional row-major layout; used by all high-end analytic DBMS
• Optimized storage of nested data structures: • Patterned after Dremel’s ColumnIO format
• Extensible set of column encodings: • Run-length and dictionary encodings in current version (1.2) • Delta and optimized string encodings in 2.0
• Embedded statistics: • Parquet 2.0 stores inlined column statistics for further optimization of scan efficiency