elk - what's new and showcases

ELASTICSEARCH & CO.What’s new?

tech talk @ ferret

Andrii Gakhov

NEW BRAND

www.elastic.co

ELKopen source data visual izat ion platform that allows you to interact with your data through stunning, powerful graphics.

distributed, open source search and ana ly t i cs eng ine , des igned for horizontal scalability, reliability, and easy management.

flexible, open source data collection, parsing, and enrichment pipeline.

Shield brings enterprise-grade security to Elasticsearch, protecting the entire ELK stack with encrypted communications, authentication, role-based access control and auditing.

comprehensive tool that provides you with complete transparency into the s t a t u s o f yo u r E l a s t i c s e a r c h deployment.

Elasticsearch 1.4.4 Kibana 4.0.1

Logstash 1.4.2Marvel

Shield 1.0.1

SHIELD

Security as a Plugin Security features for Elasticsearch are implemented in a plugin that you install on each node in your cluster.

ARCHITECTURE NOTES

• The plugin intercepts inbound API calls in order to enforce authentication and authorization.

• The plugin provides encryption using Secure Sockets Layer/Transport Layer Security (SSL/TLS) for the network traffic to and from the Elasticsearch node.

• The plugin uses the API interception layer that enables authentication and authorization to provide audit logging capability.

MAIN FEATURES• User Authentication

Shield defines (realm) a known set of users in order to authenticate users that make requests. The supported realms are esusers and LDAP.

• Authorization Shield’s data model for action authorization includes: Secured Resource, Privilege, Permissions, Role, Users

• Node Authentication and Channel Encryption Shield use SSL/TLS to wrap usual node communication over port 9300. When SSL/TLS is enabled, the nodes validate each other’s certificates, establishing trust between the nodes.

• IP Filtering Shield provides IP-based access control for Elasticsearch nodes that allows to restrict which other servers, via their IP address, can connect to Elasticsearch nodes and make requests.

• Auditing The audit functionality in a secure Elasticsearch cluster logs particular events and activity on that cluster. The events logged include authentication attempts, including granted and denied access.

KIBANA

Kibana 4 provides dozens of new features that enable you to compose questions, get answers, and solve problems like never before.

WHAT’S NEW?• New interface with D3, drag&drop dashboard builder• New diagrams: Area Chart, Data Table, Markdown Text Widget, Pie Chart,

Raw Document Widget, Single Metric Widget, Tile Map, Vertical Bar Chart• Advanced aggregation-based analytics capabilities: Unique counts

(cardinality), Non-date histograms, Ranges, Significant terms, Percentiles etc.• Expressions-based scripted fields enable you to perform ad-hoc analysis by

performing computations on the fly• Search result highlighting• Ability to save searches and visualizations• Faster dashboard loading due to a reduction in the number HTTP calls

needed to load the page• SSL encryption for client requests as well as requests to and from

Elasticsearch

ELASTICSEARCH

WHAT’S NEW? SINCE 1.2.0

• Upgraded to Lucene 4.10.1 release• New aggregations: percentiles_rank, top_hits, cardinality,

scripted_metric, …• Added sum of the doc counts of other buckets in terms aggs• Added support bounding box aggregation on geo_shape/

geo_point data types• Parent/child optimization• Added support for scripted upserts• Fielddata and cache optimisation• Removed deprecated gateway functionality• …

PERCENTILES RANK AGGREGATIONA multi-value metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents.

{ “aggs” : { “load_time_outlier” : { “percentile_ranks” : { “field” : “load_time”, “values” : [15, 30] } } }}

{ “aggregations” : { “load_time_outlier” : { “values” : { “15”: 92, “30”: 100 } } }}

Example above shows that 92% of page were loaded within 15 sec, and 100% within 30 sec.

TOP HITS AGGREGATIONA top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

{ “aggs”: { “top_logs”: {

“top_hits”: { “sort": [ { “created_at”: { “order”: “desc” } } ], “_source”: { “include”: [ “path” ] }}

}

{“aggregations”: { “top_logs”: { “hits”: { “total”: 180 “hits”: [ {

“_index”: “logs”,“_type”: “log”,“_id”: “an893d30mlss”,“_source”: { “path”: “/home/user/”}sort: [ 1422388801000 ]

…}

CARDINALITY AGGREGATIONA single-value metrics aggregation that calculates an approximate count of distinct values. It is based on the HyperLogLog++ algorithm, which counts based on the hashes of the values with some interesting properties:• configurable precision, which decides on how to trade memory for accuracy,• excellent accuracy on low-cardinality sets,• fixed memory usage: no matter if there are tens or billions of unique values,

memory usage only depends on the configured precision.

{ “aggs” : { “tags_count” : { “cardinality” : { “field” : “tags”, “precision_threshold”: 100 } } }}

{ “aggregations” : { “tags_count” : { “value”: 120002 } }}

SCRIPTED METRIC AGGREGATIONA metric aggregation that executes using scripts to provide a metric output.

{ “aggs” : { "profit": { "scripted_metric": { "init_script" : "_agg['transactions'] = []", "map_script" : "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }", "combine_script" : "profit = 0; for (t in _agg.transactions) { profit += t }; return profit", "reduce_script" : "profit = 0; for (a in _aggs) { profit += a }; return profit" } }}

SHOWCASES

PROBLEM I

{ “location”: { “type”: “geo_point” }, “tags”: { “type”: “string”, “index”: “not_analyzed” }, “text”: { “type”: “string”, “index”: “not_analyzed” }}

Find most popular tags per location (e.g. grouping by geohash with precision 10km x 10km)

SOLUTIONuse geohash_grid and terms aggregations

{ “aggs”: { “hotspots”: { “geohash_grid” : { “field”: “location”, “precision”: 10 }, "aggs": { “top_tags": { "terms": { “field”: “tags” } …}

RESPONSE EXAMPLE“aggregations”: { “hotspots”: { “buckets”: [ { "key": "dr5rs", "doc_count": 2 “top_tags”: { “buckets”: [

{ “key”: “#NY” “doc_count”: 20001 }, { “key”: “#Obama” “doc_count”: 1201 }, … ] }

},…

] …

PROBLEM II

{ “event”: { “type”: “string”, “index”: “not_analyzed" }, “rating”: { “type”: “float” } }}

Find total number of records and average rating for events with most number of rating records

SOLUTION

{ “aggs”: {

“top_events”: { “terms”: { “field”: “event” }, “aggs”: { “avg_rating”: { “avg”: { “field”: “rating” }

…}

use terms and avg aggregations

RESPONSE EXAMPLE“aggregations”: { “top_events”: { “buckets”: [ { “key”: “Venus Berlin” “doc_count”: 36665, “avg_rating”: {

“value”: 9.991 } },

{ “key”: “ITB Berlin” “doc_count”: 365, “avg_rating”: {

“value”: 8.46 } }

…

PROBLEM III

{ “tags”: { “type”: “string”, “index”: “not_analyzed" }, “keywords”: { “type”: “nested”, “properties”: { “lemma”: { “type”: “string”, “index”: “not_analyzed" } }}

Find top tags for most popular keywords’ lemmas

SOLUTION

{ "aggs": { "kw": { "nested": { "path": “keywords" }, "aggs": {

"top_lemmas": { "terms": { "field": “keywords.lemma" }, "aggs": { "kw_to_tags": { "reverse_nested": {}, "aggs": { "top_tags_per_lemma": { "terms": { "field": “tags" } }

…}

use nested aggregation together with terms and reverse_nested aggregations

RESPONSE EXAMPLE“aggregations”: { “kw”: { “doc_count”: 6829872, “top_lemmas”: { “buckets”: [ { “key”: “BMW” “doc_count”: 36665, “kw_to_lemma”: {

“doc_count”: 36626 “top_tags_per_lemma: { “buckets”: [

{ “key”: “auto” “doc_count”: 36626 }, { “key”: “car” “doc_count”: 12216 },

] …

PROBLEM IV

{ “tags”: { “type”: “string”, “index”: “not_analyzed" }, “text”: { “type”: “string”, “index”: “not_analyzed" }, “created_at”: { “type”: “date” }}

Find latest tweets for most popular tags

SOLUTIONuse terms and top_hits aggregations

{ “aggs”: {

“top_tags”: { “terms”: { “field”: “tags” }, “aggs”: { “top_tweets”: { “top_hits”: { “sort": [ { “created_at”: { “order”: “desc” } } ], }

…}

RESPONSE EXAMPLE“aggregations”: { “top_tags”: { “buckets”: [ { “key”: “#TheDress” “doc_count”: 30000 “top_tweets”: { “hits”: { “total”: 30000 “hits”: [ {

“_index”: “tweets”, “_type”: “tweet”, “_id”: “579024639982202880”, “_source”: { “tags”: [ “#TheDress”, “#TheSims4”] “text”: “just put #TheDress in #TheSims4!” “created_at”: 2015-03-20T20:00:01 } sort: [ 1422388801000 ]

…

PROBLEM V

{ “topics”: { “type”: “string”, “index”: “not_analyzed" }, “title”: { “type”: “string” }, “created_at”: { “type”: “date” }}

Find news that contain “Obama” in title and top topics from all news regardless the title

SOLUTIONuse query_string, global and terms aggregations

{ “query”: { “query_string”: { “default_field” : “title”, “query” : “Obama” } }, “aggs”: {

“all_news”: { “global” : {}, “aggs”: { “top_topics”: { “terms”: { “field”: “topics” }

…}

RESPONSE EXAMPLE“hits”: { “total”: 23, “max_score”: 2.9730792, “hits”: [ { “_index”: “news”, “_type”: ”record”, “_id”: 6785, “_score”: 2.9730792, “_source”: … }, … ]},“aggregations”: { “all_news”: { “doc_count”: 24495, “top_tags”: { “buckets”: [ { “key”: “Politics” “doc_count”: 20001 } …

THANK YOU

elk - what's new and showcases

Technology

elasticsearch nodes

users node authentication

new interface

plugin security features

dozens of new features

authentication attempts

s t i c s e

data table