elk - what's new and showcases
TRANSCRIPT
ELKopen source data visual izat ion platform that allows you to interact with your data through stunning, powerful graphics.
distributed, open source search and ana ly t i cs eng ine , des igned for horizontal scalability, reliability, and easy management.
flexible, open source data collection, parsing, and enrichment pipeline.
Shield brings enterprise-grade security to Elasticsearch, protecting the entire ELK stack with encrypted communications, authentication, role-based access control and auditing.
comprehensive tool that provides you with complete transparency into the s t a t u s o f yo u r E l a s t i c s e a r c h deployment.
Elasticsearch 1.4.4 Kibana 4.0.1
Logstash 1.4.2Marvel
Shield 1.0.1
SHIELD
Security as a Plugin Security features for Elasticsearch are implemented in a plugin that you install on each node in your cluster.
ARCHITECTURE NOTES
• The plugin intercepts inbound API calls in order to enforce authentication and authorization.
• The plugin provides encryption using Secure Sockets Layer/Transport Layer Security (SSL/TLS) for the network traffic to and from the Elasticsearch node.
• The plugin uses the API interception layer that enables authentication and authorization to provide audit logging capability.
MAIN FEATURES• User Authentication
Shield defines (realm) a known set of users in order to authenticate users that make requests. The supported realms are esusers and LDAP.
• Authorization Shield’s data model for action authorization includes: Secured Resource, Privilege, Permissions, Role, Users
• Node Authentication and Channel Encryption Shield use SSL/TLS to wrap usual node communication over port 9300. When SSL/TLS is enabled, the nodes validate each other’s certificates, establishing trust between the nodes.
• IP Filtering Shield provides IP-based access control for Elasticsearch nodes that allows to restrict which other servers, via their IP address, can connect to Elasticsearch nodes and make requests.
• Auditing The audit functionality in a secure Elasticsearch cluster logs particular events and activity on that cluster. The events logged include authentication attempts, including granted and denied access.
KIBANA
Kibana 4 provides dozens of new features that enable you to compose questions, get answers, and solve problems like never before.
WHAT’S NEW?• New interface with D3, drag&drop dashboard builder• New diagrams: Area Chart, Data Table, Markdown Text Widget, Pie Chart,
Raw Document Widget, Single Metric Widget, Tile Map, Vertical Bar Chart• Advanced aggregation-based analytics capabilities: Unique counts
(cardinality), Non-date histograms, Ranges, Significant terms, Percentiles etc.• Expressions-based scripted fields enable you to perform ad-hoc analysis by
performing computations on the fly• Search result highlighting• Ability to save searches and visualizations• Faster dashboard loading due to a reduction in the number HTTP calls
needed to load the page• SSL encryption for client requests as well as requests to and from
Elasticsearch
WHAT’S NEW? SINCE 1.2.0
• Upgraded to Lucene 4.10.1 release• New aggregations: percentiles_rank, top_hits, cardinality,
scripted_metric, …• Added sum of the doc counts of other buckets in terms aggs• Added support bounding box aggregation on geo_shape/
geo_point data types• Parent/child optimization• Added support for scripted upserts• Fielddata and cache optimisation• Removed deprecated gateway functionality• …
PERCENTILES RANK AGGREGATIONA multi-value metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents.
{ “aggs” : { “load_time_outlier” : { “percentile_ranks” : { “field” : “load_time”, “values” : [15, 30] } } }}
{ “aggregations” : { “load_time_outlier” : { “values” : { “15”: 92, “30”: 100 } } }}
Example above shows that 92% of page were loaded within 15 sec, and 100% within 30 sec.
TOP HITS AGGREGATIONA top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
{ “aggs”: { “top_logs”: {
“top_hits”: { “sort": [ { “created_at”: { “order”: “desc” } } ], “_source”: { “include”: [ “path” ] }}
}
{“aggregations”: { “top_logs”: { “hits”: { “total”: 180 “hits”: [ {
“_index”: “logs”,“_type”: “log”,“_id”: “an893d30mlss”,“_source”: { “path”: “/home/user/”}sort: [ 1422388801000 ]
…}
CARDINALITY AGGREGATIONA single-value metrics aggregation that calculates an approximate count of distinct values. It is based on the HyperLogLog++ algorithm, which counts based on the hashes of the values with some interesting properties:• configurable precision, which decides on how to trade memory for accuracy,• excellent accuracy on low-cardinality sets,• fixed memory usage: no matter if there are tens or billions of unique values,
memory usage only depends on the configured precision.
{ “aggs” : { “tags_count” : { “cardinality” : { “field” : “tags”, “precision_threshold”: 100 } } }}
{ “aggregations” : { “tags_count” : { “value”: 120002 } }}
SCRIPTED METRIC AGGREGATIONA metric aggregation that executes using scripts to provide a metric output.
{ “aggs” : { "profit": { "scripted_metric": { "init_script" : "_agg['transactions'] = []", "map_script" : "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }", "combine_script" : "profit = 0; for (t in _agg.transactions) { profit += t }; return profit", "reduce_script" : "profit = 0; for (a in _aggs) { profit += a }; return profit" } }}
PROBLEM I
{ “location”: { “type”: “geo_point” }, “tags”: { “type”: “string”, “index”: “not_analyzed” }, “text”: { “type”: “string”, “index”: “not_analyzed” }}
Find most popular tags per location (e.g. grouping by geohash with precision 10km x 10km)
SOLUTIONuse geohash_grid and terms aggregations
{ “aggs”: { “hotspots”: { “geohash_grid” : { “field”: “location”, “precision”: 10 }, "aggs": { “top_tags": { "terms": { “field”: “tags” } …}
RESPONSE EXAMPLE“aggregations”: { “hotspots”: { “buckets”: [ { "key": "dr5rs", "doc_count": 2 “top_tags”: { “buckets”: [
{ “key”: “#NY” “doc_count”: 20001 }, { “key”: “#Obama” “doc_count”: 1201 }, … ] }
},…
] …
PROBLEM II
{ “event”: { “type”: “string”, “index”: “not_analyzed" }, “rating”: { “type”: “float” } }}
Find total number of records and average rating for events with most number of rating records
SOLUTION
{ “aggs”: {
“top_events”: { “terms”: { “field”: “event” }, “aggs”: { “avg_rating”: { “avg”: { “field”: “rating” }
…}
use terms and avg aggregations
RESPONSE EXAMPLE“aggregations”: { “top_events”: { “buckets”: [ { “key”: “Venus Berlin” “doc_count”: 36665, “avg_rating”: {
“value”: 9.991 } },
{ “key”: “ITB Berlin” “doc_count”: 365, “avg_rating”: {
“value”: 8.46 } }
…
PROBLEM III
{ “tags”: { “type”: “string”, “index”: “not_analyzed" }, “keywords”: { “type”: “nested”, “properties”: { “lemma”: { “type”: “string”, “index”: “not_analyzed" } }}
Find top tags for most popular keywords’ lemmas
SOLUTION
{ "aggs": { "kw": { "nested": { "path": “keywords" }, "aggs": {
"top_lemmas": { "terms": { "field": “keywords.lemma" }, "aggs": { "kw_to_tags": { "reverse_nested": {}, "aggs": { "top_tags_per_lemma": { "terms": { "field": “tags" } }
…}
use nested aggregation together with terms and reverse_nested aggregations
RESPONSE EXAMPLE“aggregations”: { “kw”: { “doc_count”: 6829872, “top_lemmas”: { “buckets”: [ { “key”: “BMW” “doc_count”: 36665, “kw_to_lemma”: {
“doc_count”: 36626 “top_tags_per_lemma: { “buckets”: [
{ “key”: “auto” “doc_count”: 36626 }, { “key”: “car” “doc_count”: 12216 },
] …
PROBLEM IV
{ “tags”: { “type”: “string”, “index”: “not_analyzed" }, “text”: { “type”: “string”, “index”: “not_analyzed" }, “created_at”: { “type”: “date” }}
Find latest tweets for most popular tags
SOLUTIONuse terms and top_hits aggregations
{ “aggs”: {
“top_tags”: { “terms”: { “field”: “tags” }, “aggs”: { “top_tweets”: { “top_hits”: { “sort": [ { “created_at”: { “order”: “desc” } } ], }
…}
RESPONSE EXAMPLE“aggregations”: { “top_tags”: { “buckets”: [ { “key”: “#TheDress” “doc_count”: 30000 “top_tweets”: { “hits”: { “total”: 30000 “hits”: [ {
“_index”: “tweets”, “_type”: “tweet”, “_id”: “579024639982202880”, “_source”: { “tags”: [ “#TheDress”, “#TheSims4”] “text”: “just put #TheDress in #TheSims4!” “created_at”: 2015-03-20T20:00:01 } sort: [ 1422388801000 ]
…
PROBLEM V
{ “topics”: { “type”: “string”, “index”: “not_analyzed" }, “title”: { “type”: “string” }, “created_at”: { “type”: “date” }}
Find news that contain “Obama” in title and top topics from all news regardless the title
SOLUTIONuse query_string, global and terms aggregations
{ “query”: { “query_string”: { “default_field” : “title”, “query” : “Obama” } }, “aggs”: {
“all_news”: { “global” : {}, “aggs”: { “top_topics”: { “terms”: { “field”: “topics” }
…}
RESPONSE EXAMPLE“hits”: { “total”: 23, “max_score”: 2.9730792, “hits”: [ { “_index”: “news”, “_type”: ”record”, “_id”: 6785, “_score”: 2.9730792, “_source”: … }, … ]},“aggregations”: { “all_news”: { “doc_count”: 24495, “top_tags”: { “buckets”: [ { “key”: “Politics” “doc_count”: 20001 } …