technology behind-real-time-log-analytics

25
Technology behind Real Time Log Analytics ELK- Elasticsearch, Logstash and Kibana By Supaket Wongkampoo @ Predictive Analytics and Data Science Conference 28 May 2016

Upload: data-science-thailand

Post on 16-Apr-2017

524 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Technology behind-real-time-log-analytics

Technology behind Real Time Log AnalyticsELK- Elasticsearch, Logstash and Kibana

By Supaket Wongkampoo @ Predictive Analytics and Data Science Conference28 May 2016

Page 2: Technology behind-real-time-log-analytics

SUPAKET WONGKAMPOO

Software Engineer @ Agoda

*DevOps in passion*

- Full Stack Developer - Virtualisation and Infrastruction as code (Puppet/Ansible) - Release Management and continuous development - Real time Log Analytics

Page 3: Technology behind-real-time-log-analytics

State of the Art, Logging Terminology in Large Scale Data processing

Page 4: Technology behind-real-time-log-analytics

Common use cases

•*Issue debugging

•*Performance analysis

•Security analysis

•*Predictive analysis

•Internet of things (IoT) and logging

Page 5: Technology behind-real-time-log-analytics

Challenges in log analysis

•*Non-consistent log format

•*Decentralized logs

•Expert knowledge requirement

Page 6: Technology behind-real-time-log-analytics

Non-consistent log format

TOMCAT LOGSA typical tomcat server startup log entry will look like this:May 24, 2015 3:56:26 PM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive \soft\apache-tomcat-7.0.62\webapps\sample.war has finished in 253 ms APACHE ACCESS LOGS – COMBINED LOG FORMATA typical Apache access log entry will look like this:127.0.0.1 - - [24/May/2015:15:54:59 +0530] "GET /favicon.ico HTTP/1.1" 200 21630 IIS LOGSA typical IIS log entry will look like this:2012-05-02 17:42:15 172.24.255.255 - 172.20.255.255 80 GET /images/favicon.ico - 200 Mozilla/4.0+(compatible;MSIE+5.5;+Windows+2000+Server)

Page 7: Technology behind-real-time-log-analytics

DECENTRALIZED LOGS

For one or two servers' setup, finding out some information from logs involves running cat or tail commands or piping these results to grep command.

Page 8: Technology behind-real-time-log-analytics

Elasticsearch

Page 9: Technology behind-real-time-log-analytics

Elasticsearch - Key feature

•• Schema-free, REST & JSON based document store

•• Distributed and horizontally scalable

•• Open Source: Apache License 2.0

•• Zero configuration

•• Written in Java, extensible

Page 10: Technology behind-real-time-log-analytics

Elasticsearch - Term

• Index - Logical collection of data; might be time based Analogous to a database

• Replications - Read scalability, Removing SPOF

• Sharding - Split logical data over several machines Write scalability, Control data flows

Page 11: Technology behind-real-time-log-analytics

Elasticsearch - Distributed and scalable

Page 12: Technology behind-real-time-log-analytics

Elasticsearch - Distributed and scalable

Page 13: Technology behind-real-time-log-analytics

Elasticsearch - use cases

• Product search engine, Products grouped, Allowing to filter

• Scoring

✴ Possible influential factors, Age of the product, been ordered in last 24h In Stock?, No shipping costs, Special offer, Rating

• Analytics

✴ Aggregation, multidimensional (Average revenue per category id per day)

Page 14: Technology behind-real-time-log-analytics

Logstash

Page 15: Technology behind-real-time-log-analytics

Logstash• Managing events and logs

• Collect, parse, enrich, store data

• Modular: many, many inputs and outputs

• Open Source: Apache License 2.0

• Ruby app

• Part of Elasticsearch family

Page 16: Technology behind-real-time-log-analytics

Why collect & centralize logs?•Access log files without system access

•Shell scripting: Too limited or slow

•Using unique ids for errors, aggregate it across your stack

•Reporting (everyone can create his/her own report)

•Bonus points: Unify your data to make it easily

•Searchable

Page 17: Technology behind-real-time-log-analytics

Logstash-Architecture

? ?outputFilterInput

Page 18: Technology behind-real-time-log-analytics

Logstash-Inputs

• Monitoring: collectd, graphite, ganglia, snmptrap, zenoss • Datastores: elasticsearch, redis, sqlite, s3 • Queues: rabbitmq, zeromq, kafka • Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log

Page 19: Technology behind-real-time-log-analytics

Logstash-Filters

•alter, anonymize, checksum, csv, drop, multiline •dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range •json, urldecode, useragent

Page 20: Technology behind-real-time-log-analytics

Logstash-Outputs

• Store: elasticsearch, gemfire, mongodb, redis, riak, rabbitmq • Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabbix • Notification: email, hipchat, irc, pagerduty, sns • Protocol: http, lumberjack, metriccatcher, stomp,

Page 21: Technology behind-real-time-log-analytics

Kibana

•Flexible analytics and data visualization platform

Page 22: Technology behind-real-time-log-analytics

Kibana

Page 23: Technology behind-real-time-log-analytics

Combine - ELK

Page 24: Technology behind-real-time-log-analytics

Hands on - ELK

WebWeb

WebWeb

WebWeb

KafKa

Page 25: Technology behind-real-time-log-analytics

Q&A