elasticsearch, logstash, kibana. cool search, analytics, data mining and more

94
ELASTICSEARCH , LOGSTASH, KIBANA COOL SEARCH, ANALYTICS, DATA MINING AND MORE… OLEKSIY PANCHENKO / LOHIKA / 2015

Upload: oleksiy-panchenko

Post on 15-Apr-2017

3.368 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

E L A S T I C S E A R C H ,LO G S TA S H , K I B A N A

C O O L S E A R C H ,A N A LY T I C S ,

D ATA M I N I N GA N D M O R E …

O L E K S I Y PA N C H E N KO / LO H I K A / 2 0 1 5

Page 2: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

MY NAME IS…

Oleksiy PanchenkoSoftware engineer, Lohika

E-mail: [email protected]: oleskiyp

LinkedIn: https://ua.linkedin.com/in/opanchenko

Page 3: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

AGENDA• Introduction. What is it all about?• Jump start Elastic. Demo time• Architecture and deployment. Why is

Elasticsearch elastic?• Case studies. 4 real-life projects• Query API in depth + Demo• Elasticsearch ecosystem. ELK Stack + Demo• Q & A

Page 4: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

INTRODUCTIONW H AT I S I T A L L A B O U T ?

Page 5: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

HOW TO MAKE YOUR SITE SEARCHABLE?

http://www.imbusstop.com/wp-content/uploads/2015/02/websites.png

Page 6: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• Google search• Why not to use plain vanilla SQL? RDBMS rocks! select * from books join authors on … where …• Sphinx (hello Craigslist, Habrahabr, The Pirate

Bay, 1C); Xapian• Lucene Family: Apache Lucene, Elasticsearch,

Apache Solr, Amazon Cloudsearch, …

Page 7: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

WHO HAS EVER USED ELASTICSEARCH?

http://dolhomeschoolcenter.com/wp-content/uploads/2013/02/FAQ.png

Page 8: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LUCENE AS A CORE• Lucene = Low-level Java library (JAR) which

implements search functionality• Can be used in both web and standalone

applications (desktop, mobile)• Lucene stores its index as a local binary file• Implemented in Java, ports to other languages

available• Initial version: 1999• Apache project since 2001• Latest stable release: 5.2.1 (15 June 2015)

Page 9: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LUCENE AS A CORE• Lucene was originally

written in 1999 by Doug Cutting (creator of Hadoop and Nutch, currently Chief Architect at Cloudera) as a part of open-source web search engine (Nutch)

http://www.china-cloud.com/uploads/allimg/121018/54-12101P92R1U7.jpg

Page 10: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

MORE ABOUT SEARCH ENGINES

Riak Search

Page 11: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

TIME TO TALK ABOUT ELASTICSEARCH

https://www.elastic.co/products/elasticsearch

Near Real-Time Data (NRT)

Full-Text SearchMultilingual search, geolocation, fuzzy search, did-you-mean suggestions, autocomplete

Page 12: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

https://www.elastic.co/products/elasticsearch

High Availability

Multitenancy

Distributed, Horizontally Scalable

Page 13: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

https://www.elastic.co/products/elasticsearch

Document-Oriented

Schema-Free

Conflict ManagementOptimistic Concurrency Control

Page 14: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

https://www.elastic.co/products/elasticsearch

Apache 2 Open Source License

Awesome documentation

Large community

Developer-Friendly, RESTful APIClient libraries available for many programming languages and frameworks.

Page 15: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH USERS

https://www.elastic.co/use-caseshttps://en.wikipedia.org/wiki/Elasticsearch#Users

Page 16: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH – PAST & PRESENT• 2004. Shay Banon (aka

Kimchy) started working on Compass – Java Search Engine on top of Lucene• 2010. Initial release of

Elasticsearch• Latest stable release:

1.7.1(July 29, 2015)• 500K downloads per

month• https://github.com/elastic/elasticsearch

http://opensource.hk/sites/default/files/u1/shay-banon.jpg

Page 17: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCHAS A COMPANY• 2012. Elasticsearch BV; Funding: $104M in 3

rounds, 100+ employees• https://www.elastic.co/• Product portfolio:

– Elasticsearch, Logstash, Kibana (ELK stack)– Watcher– Shield– Marvel– es-hadoop– found

Page 18: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

JUMP START ELASTIC

D E M O T I M E

Page 19: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

INSTALLATION & CONFIGURATION• Prerequisites:

– JDK 6 or above (recommended: JDK 8)– RAM: min. 2Gb (recommended: 16–64 Gb for

production)– CPU: number of cores over clock rate– Disks: recommended SSD

• Homebrew, apt, yum: apt-get install elasticsearch

• Download (ZIP, TAR, DEB, RPM): https://www.elastic.co/downloads/elasticsearch

• Installation is absolutely straightforward and easy: https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html

Page 20: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LET’S TALK ABOUT TERMINOLOGYIndex ~ DB Schema

Type ~ DB Table

Document

Record, JSON object

Mapping ~ Schema definition in RDBMS

Page 21: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

DEMO #1

http://www.telikin.com/cms/images/shocked_senior_computer_user.jpg

Page 22: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

http://orig06.deviantart.net/a893/f/2008/017/1/f/coffee_break____by_dragonshy.jpg

Page 23: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ARCHITECTURE AND DEPLOYMENTW H Y I S E L A S T I C S E A R C H E L A S T I C ?

Page 24: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

Cluster One or more nodes which share the same cluster name

Node Running instance of Elasticsearch which belongs to a cluster

Shard A portion of data – single Lucene instance.Default: 5 shards in an index

Primary Shard

Master copy of data

Replica Shard

Exact copy of a primary shard.Default: 1 replica

Page 25: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SINGLE-NODE CLUSTER0 1 2 3 4

HashFunction*

{ "id": "123", "name": "john", … }

{ "id": "124", "name": "patricia", … }

{ "id": "125", "name": "scott", … }

* Also consider custom routing

Page 26: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

TWO-NODE CLUSTER

0 1 R2 3 R4Node 1

R0 R1 2 R3 4Node 2

* Ability to ‘route’ indexes to particular nodes (tag-based, e.g.: ‘strong’, ‘medium’, ‘weak’)

Page 27: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

BENEFITS OF SHARDING• Take advantage of multi-core CPUs (one shard

is a single Lucene instance = single JVM process)• Horizontal scalability. Dynamic rebalancing• Fault tolerance and cluster resilience• NB! The number of shards can not be changed

dynamically on the fly – need to perform full reindexing• Max number of documents per shard:

2,147,483,519 – imposed by Lucene

Page 28: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

CUSTOM ROUTING• Social network. Users, events• event_id: 17567654, 17567655, 17567656, …user_id: 10300, 10301, …

• No Elasticsearch ID provided: ID will be auto-generated Events will be equally distributed across the shards

• Obvious approach: Elasticsearch ID = event_id Events will be equally distributed across the shards

• Elasticsearch ID = user_id Events which belong to the same user will be stored in a single shard no overheads better performance

Page 29: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH NODE TYPES• Data node node.data = true• Master node node.master = true• Communication client http.enabled = true• TCP ports 9200 (ext), 9300 (int)• A node can play 2 or 3 roles at the same time• Multicast discovery (true by default):discovery.zen.ping.multicast.enabled

Page 30: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

DEPLOYMENT DIAGRAM

Page 31: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

INDEXING A DOCUMENT

https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-write.html

Page 32: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

RETRIEVING A DOCUMENT

https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-read.html

• In terms of retrieving documents, primary and replica shards are equivalent: data can be read from either primary or replica shard

Page 33: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

DISTRIBUTED SEARCH• Given search query, retrieve 10 most relevant results

https://www.elastic.co/guide/en/elasticsearch/guide/current/_query_phase.html

Page 34: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

CASE STUDIES4 R E A L - L I F E P R O J E C T S

http://vignette1.wikia.nocookie.net/fallout/images/9/9d/FNV_Rake.png/revision/latest?cb=20140618212609&path-prefix=ru

Page 35: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

GENERAL INFO• 4 projects, ~2 years• RDBMS (MySQL, PostgreSQL) as a primary

data storage• Both on-premise Elasticsearch installation

(AWS, MS Azure) and SaaS (Bonsai @ Heroku)• 1 or 2 instances in a cluster• Data volume: Gigabytes; millions of

documents• Back-end: Java, Ruby

Page 36: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

#1. SOCIAL INFLUENCER MARKETING PLATFORM

http://www.nclurbandesign.org/wp-content/uploads/2015/05/blog-pic-b2c.jpg

Page 37: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• Document types: Blog Posts, Bloggers (Influencers)• Elasticsearch usage:

– search and rank Influencers by category, keywords, tags, location, audience, influence

– search blog posts by keywords etc.• Amount of data:

– Influencers: hundreds of thousands– Blog Posts: millions

• ES cluster size: 2 instances• Technology stack: Java, MySQL, Dynamo DB,

AWS• Considered alternatives: Sphinx, Apache Solr

Page 38: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

#2. JOB SITE

http://www.roberthalf.com/sites/default/files/Media_Root/Images/RH-Images/Using-a-job-search-site.jpg

Page 39: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• Document types: Job Postings, Jobseekers• Find relevant jobs

– Simple one-click search– Advanced search (title, keywords, industry,

location/distance, salary, requirements)• Elasticsearch as a Recommendation Engine

Recommend jobs based on: previously applied/viewed jobs, location, distance, schedule etc.• 2 types of recommendations:

– Side banner (You also might be interested in…)

– E-mail subscriptions every 2 weeks• Find appropriate candidates by location,

requirements (experience, education, languages), salary expectations

Page 40: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• No fixed document structure (jobs from different providers)• Full-text search• Fuzzy search• Geolocation (distance)• Weighted search: Boosted search

clauses• Dynamic scripting (Mvel until v1.4.0,

then Groovy)

SEARCH QUERIES

Page 41: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SOME MORE FACTS• Amount of data:

– Job postings: ~1M–Applicants: ~20K

• Cluster size: 2 ‘medium’ EC2 instances• Technology stack:

–Ruby on Rails–Elasticsearch, PostgreSQL, Redis–Heroku + add-ons, AWS (S3, EC2)–Lots of 3rd party APIs and integrations

Page 42: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

IMPLEMENTATION (RUBY)• A Model is ActiveRecord (Ruby on Rails ORM)• ActiveRecord can persist itself to the database• ActiveRecord::Callbacks:

– after_commit on [:create, :update] { index_document }– after_commit on [:destroy] { delete_document }– after_create…– after_save …– after_destroy…

• Rake tasks to drop/recreate index, reindex documents

• Zero-downtime reindexing using aliases• Ruby/Rails client:

https://github.com/elastic/elasticsearch-rails

Page 43: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LESSONS LEARNED• On-premise deployment (EC2) vs. SaaS

(Bonsai @ Heroku)• Dynamic scripting• PostgreSQL as a backup search engine

sucks

Page 44: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

#3. CAR TRADING

http://bigskybeetles.com/wp-content/uploads/2014/12/restored-beetle-car.png

Page 45: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

PARSING ADS

Price

$3900

Page 46: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

1996 VW PASSAT SEDAN B4 TDI TURBO DIESEL 44+MPGWAT???• Fuzzy Search (Levenstein Distance Algorithm) used to parse

ads and classify cars• Elasticsearch index contains dictionary (Year, Make, Model,

Trim)• Used in conjunction with other approaches: regular

expressions, dictionaries of synonyms (VW Volkswagen, Chevy Chevrolet), normalization (e.g. LX-370 LX370)

• Algorithm approach:– Parse Year (1996)– Search most relevant Make (VW, volkswagon

Volkswagen)– Search most relevant Model (Passat) for Make =

Volkswagen, Year = 1996– Search most relevant Trim (TDi 4dr Sedan)

• Parsing quality: 90%https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-fuzzy-query.html

Page 47: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

#4. [NDA]

http://cdn.4glaza.ru/images/products/large/0/bresser-junior-loupe-2x-4x-dop6.jpg

Page 48: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SOME UNCOVERED INFO• Check documents against duplicate content• Shingle analysis (commonly used by copywriters and SEO

experts)– I have a dream that one day this nation will rise up and live…– Normalization

I have a dream that one day this nation will rise up and live…

– Splitting a text into shingles (n-grams), n = 3..10have dream that

dream that thisthat this nationthis nation will

…– Replacement: latin ‘c’ cyrillic ‘c’

• Custom or standard ES implementation of Shingle analysishttps://en.wikipedia.org/wiki/W-shingling

Page 49: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

QUERY API IN DEPTH+ D E M O

Page 50: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

FILTERS VS. QUERIESAs a general rule, filters should be used:• for binary yes/no searches• for queries on exact values

Filters are much faster than queriesFilters are usually great candidates for caching

27 Filters available (Elasticsearch 1.7.1)

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filters.html

Page 51: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

QUERIES VS. FILTERSAs a general rule, queries should be used instead of filters:• for full text search• where the result depends on a relevance score

Common approach: Filter as many records as possible, then query them.

38 Queries available (Elasticsearch v 1.7.1)

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-queries.html

Page 52: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

DEMO #2

http://www.socialtalent.co/wp-content/uploads/blog-content/computer-user-confused.jpg

Page 53: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SOME THEORY BEHIND RELEVANCE SCORINGfull AND text AND search AND (elasticsearch OR lucene)

• Term Frequency: How often does the term appear in the document?

• Inverse Document Frequency: How often does the term appear in all documents in the collection?

• Field-length norm: How long is the field?

• TF, FLN etc. are calculated and stored at index timehttps://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html

http://blog.qbox.io/optimizing-search-results-in-elasticsearch-with-scoring-and-boosting

Page 54: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

MORE COOL FEATURES• Indexing attachments: MS Office, ePub, PDF

(Apache Tika)• Autocomplete suggestion:

• Did-you-mean suggestion:

• Highlight results:

Page 55: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SEARCH IMAGES

https://www.theloopyewe.com/shop/search/cd/0-100~75-90-50~18-12-12/g/59A9BAC5/https://github.com/kzwang/elasticsearch-image

Page 56: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

http://orig06.deviantart.net/a893/f/2008/017/1/f/coffee_break____by_dragonshy.jpg

Page 57: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH ECOSYSTEM.ELK STACK+ D E M O

Page 58: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

CLIENTS

http://blog.euranova.eu/wp-content/uploads/2014/04/programming-languages.png

Page 59: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• Java: 1 native client + 1 community supported• Python: 1 official + 7 community supported• Ruby: 1 official + 7 community supported• JavaScript: 1 official + 4• PHP: 1 official + 4• C#. NET: 1 official + 2• Scala: 4• Groovy (1), Haskell (1), Perl (1), Clojure (1),

Go (3),R (2), Erlang (3), OCaml (2), Smalltalk (1), ColdFusion (1), C++ (1)• Command Line (2)https://www.elastic.co/guide/en/elasticsearch/client/community/current/clients.html

Page 60: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

INTEGRATIONS• Django• Ruby on Rails• Spring, Spring Data• Node.js• Symfony, Drupal, Wordpress• Grails• Play! Framework

https://www.elastic.co/guide/en/elasticsearch/client/community/current/integrations.html

Page 61: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

FRONT ENDS

http://php.archive.razorflow.com/assets/img/header_v1.png

Page 62: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH-HEAD

http://mobz.github.io/elasticsearch-head/

Page 63: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ESCLIENT

https://github.com/rdpatil4/ESClient

Page 64: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

AVAILABLE FRONT ENDS

https://www.elastic.co/guide/en/elasticsearch/client/community/current/front-ends.html

• elasticsearch-head: A web front end for an Elasticsearch cluster.

• browser: Web front-end over elasticsearch data.• Inquisitor: Front-end to help debug/diagnose queries and

analyzers• Hammer: Web front-end for elasticsearch• Calaca: Simple search client for Elasticsearch• ESClient: Simple search, update, delete client for

Elasticsearch

Page 65: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

HEALTH AND PERFORMANCE

http://www.transcend-marketing.co.uk/wp-content/uploads/2014/09/health-check2.png

Page 66: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH-HEAD

https://github.com/mobz/elasticsearch-head

Page 67: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

BIGDESK

https://github.com/lukas-vlcek/bigdesk

Page 68: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

WHATSON

https://github.com/xyu/elasticsearch-whatson

Page 69: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICOCEAN

https://itunes.apple.com/us/app/elasticocean/id955278030

Page 70: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

HEALTH AND PERFORMANCE

https://www.elastic.co/guide/en/elasticsearch/client/community/current/health.html

• bigdesk: Live charts and statistics for elasticsearch cluster.• Kopf: Live cluster health and shard allocation monitoring with administration

toolset.• paramedic: Live charts with cluster stats and indices/shards information.• ElasticsearchHQ: Free cluster health monitoring tool• SPM for Elasticsearch: Performance monitoring with live charts showing cluster

and node stats, integrated alerts, email reports, etc.• check-es: Nagios/Shinken plugins for checking on elasticsearch• check_elasticsearch: An Elasticsearch availability and performance monitoring

plugin for Nagios.• opsview-elasticsearch: Opsview plugin written in Perl for monitoring

Elasticsearch• SegmentSpy: Plugin to watch Lucene segment merges across your cluster• es2graphite: Send cluster and indices stats and status to Graphite for monitoring

and graphing.• Scout: Provides plugins for monitoring Elasticsearch nodes, clusters, and indices.• ElasticOcean: Elasticsearch & DigitalOcean iOS Real-Time Monitoring tool to keep

an eye on DigitalOcean Droplets or Elasticsearch instances or both of them on-a-go.

Page 71: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

10 ES METRICS TO WATCH

http://radar.oreilly.com/2015/04/10-elasticsearch-metrics-to-watch.html

1. Cluster health — nodes and shards2. Node performance — CPU3. Node performance — memory usage4. Node performance — disk I/O5. Java — heap usage and garbage collection6. Java — JVM pool size7. Search performance — request latency and

request rate8. Search performance — filter cache9. Search performance — field data cache10.Indexing performance — refresh times and

merge times

Page 72: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

RIVERS (DEPRECATED IN 1.5.0)

http://acuate.typepad.com/.a/6a0120a5e84a91970c01539381efff970b-pi

Page 73: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• JDBC River Plugin, CSV River Plugin• MongoDB, CouchDB, Solr, Redis, Neo4j,

DynamoDB, RethinkDB, Hazelcast, …• JMS, RabbitMQ, ActiveMQ, Amazon SQS,

Kafka, …• Twitter, Wikipedia, Git, GitHub, Subversion,

RSS, …• FileSystem, Dropbox, Google Drive, Amazon S3,

…• IMAP/POP3, Web, LDAP

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-plugins.html#river

Page 74: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

OTHER PLUGINS

https://d2wucpkmh57zie.cloudfront.net/wp-content/uploads/2015/04/plugins-together.jpg

Page 75: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

• Internalization, normalization, analysis, languages support (Chinese, Japanese, Khmer, Thai etc.), transliteration etc.• Discovery plugins: Amazon AWS, MS Azure,

Google GCE, ZooKeeper• Transport plugins: allow to use Elasticsearch

REST API over Servlet, ZeroMQ, Jetty, Redis, Memecached• Scripting in Elasticsearch queries: Groovy,

JavaScript, Python, Clojure, SQL (!)• Front-ends (CRUD operations) & data

visualization• Snapshot/Restore Repository: HDFS, AWS S3,

GridFS• Misc: Attachments handling (uses Apache

Tika), image support, tracking changes, Mock Solr, NewRelic integration, …

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-plugins.html

Page 76: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCHPRODUCT PORTFOLIO

http://blog.archisnapper.com/wp-content/uploads/architecture-portfolio.jpg

Page 77: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

FOUND ($)• Elasticsearch as a service• Starts from $45/mo (1GB RAM, 8GB SSD, 1

data center)• No deployment and maintenance overhead

https://www.elastic.co/products/found

Page 78: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SHIELD ($)• Authentication• Authorization: RBAC• Encrypted communication, IP filtering• Audit logging

• Other approaches:• Jetty instead of

embedded server• Nginx as a front-end

https://www.elastic.co/products/shield

Page 79: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

MARVEL ($)• Elasticsearch cluster health check,

monitoring, performance• Real-time and historical analysis• Customizable dashboards

https://www.elastic.co/products/marvel

Page 80: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

WATCHER• Alerts about anomalies in data• Proactive monitoring of ES cluster (in

conjunction with Marvel)• A lot of ways of notifications: e-mails, SMS,

webhooks• Retrospective analysis• High availability

https://www.elastic.co/products/watcher

Page 81: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELK

https://pbs.twimg.com/media/CCAkRqVXIAA9cDE.png

Page 82: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LOGSTASH + ELASTIC + KIBANA

Page 83: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LOGSTASH ADVANCED

Page 84: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

LOGSTASH

• Variety of inputs and outputs (165 plugins)• 120 predefined patterns + custom log formats• Flexible DSL to parse/normalize/enrich logs• Implemented in Ruby, running on JRuby

https://www.elastic.co/products/logstash

Page 85: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SOME LOGSTASH INPUTS

https://www.elastic.co/guide/en/logstash/current/input-plugins.html

• file• stdin• syslog• eventlog• jdbc• varnishlog• websocket• log4j• jmx• s3

• sqs• rss• redis• rabbitmq• zeromq• kafka• twitter• elasticsearch• github• lumberjack

Page 86: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SOME LOGSTASH OUTPUTS

https://www.elastic.co/guide/en/logstash/current/output-plugins.html

• file• stdout• csv• exec• elasticsearch• email• nagios• syslog• redis• loggly

• jira• hipchat• irc• graphite• http• s3• sqs• sns• rabbitmq• zeromq

Page 87: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

KIBANA• Variety of charts: bar charts, line and scatter

plots, histograms, pie charts, maps• Flexible and customizable UI, responsive

design• Slice and dice data to get necessary details• Seamless integration with Elasticsearch• Simple data export

https://www.elastic.co/products/kibana

Page 88: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

DEMO #3

http://25.media.tumblr.com/tumblr_mbduvkuspZ1qe6vsbo1_400.jpg

Page 89: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

ELASTICSEARCH DRAWBACKS• No transaction support. Elasticsearch is not a

database.• No joins, constraints and other RDBMS

features• Durability and consistency issues, data loss:– https://

aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0

– https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

Page 90: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

PERFORMANCE?

http://blog.socialcast.com/realtime-search-solr-vs-elasticsearch/http://solr-vs-elasticsearch.com/

• Apache Solr can be faster than ES in search-only scenarios while Elasticsearch usually outperforms Solr when doing writes and reads concurrently• Sphinx is faster at indexing (up to 15MB/s per

core)• Performance issues can be usually fixed by

horizontal scaling

Page 91: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

SUMMARY• ES is not a silver bullet but really really

powerful tool• Elasticsearch is not a RDBMS and is not

supposed to act as a database. Choose your tools properly. Leverage the synergy of DB + ES

• Elasticsearch is dead simple at the start but might be sophisticated later as you go

• Kick off easily, then hire a good DevOps engineer for best results

• Ecosystem around Elasticsearch is just amazing• Give it a try – it can bring a lot of value to your

product and your CV ;) http://www.aperfectworld.org/clipart/gestures/rockhard11.png

Page 92: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

QUESTIONS?

http://dolhomeschoolcenter.com/wp-content/uploads/2013/02/FAQ.png

Page 93: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

THANK YOU!

http://conveyancingderby.co/wp-content/uploads/2011/07/cat-card.jpg

Page 94: Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more

USEFUL LINKS• Elasticsearch: https://

www.elastic.co/products/elasticsearch• Logstash: https://www.elastic.co/products/logstash• Kibana: https://www.elastic.co/products/kibana

• Scripts for the demos:https://github.com/opanchenko/morning-at-lohika-ELK