realtime analytics with elasticsearch [new media inspiration 2013]
DESCRIPTION
A presentation from the New Media Inspiration 2013 conference (http://www.tuesday.cz/akce/new-media-inspiration-2013/) about using Elasticsearch's faceting features for realtime analytics of big data.TRANSCRIPT
![Page 1: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/1.jpg)
Real time analyticsof big data with Elasticsearch
Karel Minařík
![Page 2: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/2.jpg)
JSON
Facets
Analytics
http://www.youtube.com/watch?v=-GftBySG99Q
![Page 3: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/3.jpg)
Realtime Analytics With ElasticSearch
http://karmi.cz
http://elasticsearch.com
![Page 4: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/4.jpg)
Realtime Analytics With ElasticSearch
Using a search engine for analytics?
wat?
![Page 5: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/5.jpg)
A collection of documentsHOW DOES SEARCH WORK?
file_1.txtThe ruby is a pink to blood-‐red colored gemstone ...
file_2.txtRuby is a dynamic, reflective, general-‐purpose object-‐oriented programming language ...
file_3.txt"Ruby" is a song by English rock band Kaiser Chiefs ...
![Page 6: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/6.jpg)
How do you search documents?HOW DOES SEARCH WORK?
File.read('file_1.txt').include?('ruby')File.read('file_2.txt').include?('ruby')...
![Page 7: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/7.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
TOKENS POSTINGS
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
![Page 8: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/8.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
search "ruby"
![Page 9: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/9.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
search "song"
ruby file_1.txt file_2.txt file_3.txt
![Page 10: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/10.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
english file_3.txt
rock file_3.txt
search "ruby AND song"
song file_3.txt
![Page 11: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/11.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
TOKENS POSTINGS
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
31
Statistics!
![Page 13: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/13.jpg)
Realtime Analytics With ElasticSearch
ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene.
![Page 14: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/14.jpg)
Faceted NavigationFACETS
http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
Query
Facets
![Page 15: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/15.jpg)
Faceted Navigation with ElasticsearchFACETS
curl "http://localhost:9200/people/_search?pretty=true" -‐d '{ "query" : { "match" : { "name" : "John"} }, "filter" : { "terms" : { "employer" : ["IBM"] } }, "facets" : { "employer" : { "terms" : { "field" : "employer", "size" : 3 } } }}'
User query
“Checkboxes”
Facets
http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
"facets" : { "employer" : { "missing" : 0, "total" : 10, "other" : 3, "terms" : [ { "term" : "ibm", "count" : 3 }, { "term" : "twitter", "count" : 2 }, { "term" : "apple", "count" : 2 } ] } }
Response
![Page 16: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/16.jpg)
Visualizing the FacetsFACETS
http://mbostock.github.com/d3/tutorial/bar-1.html
"facets" : { "employer" : { "missing" : 0, "total" : 10, "other" : 3, "terms" : [ { "term" : "ibm", "count" : 3 }, { "term" : "twitter", "count" : 2 }, { "term" : "apple", "count" : 2 } ] } }
d3.js ~ A Bar Chart, Part 1
DEMO: http://bl.ocks.org/4571766
![Page 17: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/17.jpg)
Visualizing the FacetsFACETS
![Page 18: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/18.jpg)
Visualizing the FacetsFACETS
![Page 20: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/20.jpg)
Realtime Analytics With ElasticSearch
‣No batch orientation‣No stats precomputation and caching‣No predefined metrics or schemas
Important Concepts
‣Combination of free text search, structured search, and facets‣ Scripting for performing ad–hoc analytics‣ Extendable: write your own facet types
![Page 21: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/21.jpg)
ScriptingFACETS
curl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'
curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'curl -X POST localhost:9200/demo-articles/_refresh
curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url",
"script" : "term.replace(new RegExp(\"https?://\"), \"\").split(\"/\")[0]", "lang" : "javascript" } } }}'
Extract and aggregate most popular domains from article URLs
"facets" : { "popular-‐domains" : { // ... "terms" : [ { "term" : "some.blogger.com", "count" : 3 }, { "term" : "github.com", "count" : 1 } ] } }
Response
![Page 22: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/22.jpg)
DemonstrationsFACETS
curl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'
curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'curl -X POST localhost:9200/demo-articles/_refresh
curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url",
"script" : "term.replace(new RegExp(\"https?://\"), \"\").split(\"/\")[0]", "lang" : "javascript" } } }}'
Extract and aggregate most popular domains from article URLs
"facets" : { "popular-‐domains" : { // ... "terms" : [ { "term" : "some.blogger.com", "count" : 3 }, { "term" : "github.com", "count" : 1 } ] } }
Response
Demo
![Page 23: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]](https://reader033.vdocument.in/reader033/viewer/2022051110/54c68cb24a7959a2128b471c/html5/thumbnails/23.jpg)
Thanks!d