search and analyze your data with elasticsearch
TRANSCRIPT
SEARCH AND ANALYZE YOUR DATA WITH ELASTICSEARCHAnton Udovychenko
JEEConf May 20, 2016
ABOUT ME Software Architect @ Levi9 8+ years of Java experience Passionate about agile methodology and clean code
http://ua.linkedin.com/in/antonudovychenko
http://www.slideshare.net/antonudovychenko
AGENDA•Why does search matter to you•Why Elasticsearch• Basic Concepts• Comparison with SQL• Elasticsearch usage• Elasticsearch and Java•Q&A
WHY DOES SEARCH MATTER TO YOU
WHY DOES SEARCH MATTER TO YOU
WHAT IS IT ABOUT
Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics
engine, designed for horizontal scalability, high availability
WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics
engine, designed for horizontal scalability, high availability
WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented,
schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-
oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
Apache 2.0 License
WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-
oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
{ "title": "My blogpost", "body": "Having a lot of text...", "user": “es_user", "postDate": "2016-01-01 15:03:32"}
WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine,
designed for horizontal scalability, high availability
REST API
WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented,
schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
Image via batman-news.com
WHY ELASTICSEARCH - ALTERNATIVES
– Complex logic (No additional level of abstraction)
+ More fine-grained control= Elasticsearch is based on Lucene
WHY ELASTICSEARCH - ALTERNATIVES
– Proprietary protocol– Real-time caveats– Difficult to go to cloud– More difficult to start using– Smaller community
Sphinx+ Faster on a cold start+ Occupies less memory= Non Java based (C++)
WHY ELASTICSEARCH - ALTERNATIVES
+ Truly open-source+ Primary support of Hadoop distributors+ ZooKeeper is more mature than Zen= Near Real-Time Search= Similar performance
– More difficult to start using– SolrCloud (vs ES out of the box)– Zookeeper is harder to use then Zen– Worse operational tools– Worse monitoring tools– Worse analytical abilities
WHY ELASTICSEARCH
BASIC CONCEPTS•Near realtime•Cluster•Node• Index• Type•Document• Shards and replicas
BASIC CONCEPTS
Cluster
BASIC CONCEPTS
Node Node Node
BASIC CONCEPTS
Shard Shard
Shard
Shard
Shard
Shard
ShardShard
BASIC CONCEPTS
Shard Shard
Shard
Shard
Shard
Shard
ShardShard
Index
BASIC CONCEPTS
Shard
Segm
ent
Segm
ent
Segm
ent
Segm
ent
Lucene Index
BASIC CONCEPTSSegment core
Term Freq
DocIds
brown 2 0,1dog 2 0,1fox 2 0,1in 1 1jump 2 0,1lazy 2 0,1over 2 0,1quick 2 0,1summer 1 1the 2 0,1
Inverted indexDocId
Fields
0 Text: The quick brown fox jumped over the lazy dog Author: Bob
1 Text: Quick brown foxes leap over lazy dogs in summerAuthor: Bill
Document store
0 2101 90
Column store
Likes0 591 23
Shared
BASIC CONCEPTSSegment coreDocId
Fields
0 Text: The quick brown fox jumped over the lazy dog Author: Bob
1 Text: Quick brown foxes leap over lazy dogs in summerAuthor: Bill
Document store
0 2101 90
Column store
Likes0 591 23
Shared
Search term: Leaping brown Fox
Term Freq
DocIds
brown 2 0,1dog 2 0,1fox 2 0,1in 1 1jump 2 0,1lazy 2 0,1over 2 0,1quick 2 0,1summer 1 1the 2 0,1
Inverted index
SQL
ELASTIC
COMPARISON WITH SQLSQL ElasticsearchDatabase IndexTable TypeRow DocumentColumn field Field
COMPARISON WITH SQLSQL ElasticsearchDatabase IndexTable TypeRow Document with propertiesColumn field Field
COMPARISON WITH SQLid title body user postDate1 My first
blogpostHaving a lot of text... es_user 2016-01-01
15:03:32
2 About search
The search data sometimes has a peculiar property…
es_user 2016-01-01 19:22:03
3 Introduction to Elasticsearch
Once I have stumbled upon this idea…
es_user 2016-01-03 11:55:41
COMPARISON WITH SQLPOST http://localhost:9200/blog
CREATE DATABASE blog;USE blog;CREATE TABLE post( id bigint(20) AUTO_INCREMENT,
title varchar(250), body text, user varchar(50), postDate timestamp, PRIMARY KEY(id));
{"mappings": { "post": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user": { "type": "string" }, "postDate": { "type": "date"} } } }
(not obligatory)
COMPARISON WITH SQL (CREATE)
POST http://localhost:9200/blog/postINSERT INTO post( title, body, user, postDate)VALUES( 'My blogpost', 'Having a lot of text...', ‘es_user', '2016-01-01 15:03:32');
{ "title": "My blogpost", "body": "Having a lot of text...", "user": "es_user", "postDate": "2016-01-01 15:03:32"}
COMPARISON WITH SQL (UPDATE)
POST http://localhost:9200/blog/post/1/_update
UPDATE post SET title='My blogpost‘WHERE id=1;
{ "doc": { "title": "My blogpost" }}
COMPARISON WITH SQL (DELETE)
DELETE http://localhost:9200/blog/post/1DELETE FROM post WHERE id=1
COMPARISON WITH SQL (READ)
GET http://localhost:9200/blog/post/1SELECT * FROM post WHERE id=1
SELECT * FROM post GET http://localhost:9200/blog/post/_search
SELECT * FROM post WHERE user=‘es_user’
GET http://localhost:9200/blog/post/_search?q=user:es_user
COMPARISON WITH SQL (READ)
POST http://localhost:9200/blog/post/_search
SELECT * FROM post WHERE body LIKE '%Having %';
{ "query": { "match": { "body": "Having" } }}
DEMO TIME
ELASTICSEARCH AND JAVA
• Native Java client• Spring Data Elasticsearch• REST endpoints• Jest (https://github.com/searchbox-io/Jest)
https://github.com/terrafant/es-feeder
DEMO TIME
Application
ELASTICSEARCH USAGE
ES c
lient
JDBC
DB
Elasticsearch
cluster
REST
Nativ e
Request
SQL
Binary
JSON
ELASTICSEARCH USAGE (DETAILS)
Load
bal
ance
r
Master-
eligible
Node
Master-
eligible
Node
ClientNode
DataNod
e
DataNod
e
DataNod
eDataNod
e
DataNod
e
DataNod
eDataNod
e
DataNod
e
DataNod
eDataNod
e
DataNod
e
DataNod
e
Master
Node
ClientNode
ClientNode
Elas
ticse
arch
clu
ster
ELASTICSEARCH USAGE (ELK)
Frontend Backend
Elasticsearch Kibana
Logstash
Brow
ser
DB
Logstash
Logstash
Broker
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 2
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user9. Use monitoring tools
TOP 10 PRODUCTION RECOMMENDATIONS
1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user9. Use monitoring tools10.Use Oracle JDKs
THANK YOU!Get social@elastic
Explore the docselastic.co/guide
Give it a tryelastic.co/downloads/elasticsearch
Join the communitydiscuss.elastic.com
Check ELK stackdemo.elastic.co