practical elasticsearch - real world use cases

38
Itamar Syn-Hershko http://code972.com @synhershko Practical Elasticsearch

Upload: itamar

Post on 07-Aug-2015

220 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Practical Elasticsearch - real world use cases

Itamar Syn-Hershko

http://code972.com

@synhershko

Practical Elasticsearch

Page 2: Practical Elasticsearch - real world use cases

Me?

• Itamar Syn-Hershko / @synhershko

• Lucene.NET PMC and lead committer

• Microsoft MVP

• RavenDB

– X-Core developer

– “RavenDB in Action” authorConsulting Partner

Page 3: Practical Elasticsearch - real world use cases
Page 4: Practical Elasticsearch - real world use cases

An index

Page 5: Practical Elasticsearch - real world use cases

Elasticsearch

• Powered by Apache Lucene

• Open-source

• Rapid growth

• High profile users world-wide

Page 6: Practical Elasticsearch - real world use cases

REST API

• Indexes• Types• IDs

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{"user" : "synhershko","post_date" : "2013-05-30T14:12:12","message" : "trying out Elastic Search","followers": 3,"registered": true

}'

Page 7: Practical Elasticsearch - real world use cases

Full-Text Search

Page 8: Practical Elasticsearch - real world use cases

DocumentsTerm

<6>and

<2> <3>big

<6>dark

<4>did

<2>gown

<3>had

<2> <3>house

<1> <2> <3> <5> <6>in

<1> <3> <5>keep

<1> <4> <5>keeper

<1> <5> <6>keeps

<6>light

<4>never

<1> <4> <5>night

<1> <2> <3> <4>old

<4>sleep

<6>sleeps

<1> <2> <3> <4> <5> <6>the

<1> <3>town

<4>where

The index:

Dictionary and

posting lists

6 documents to index

Example from:

Justin Zobel , Alistair Moffat,

Inverted files for text search engines,

ACM Computing Surveys (CSUR)

v.38 n.2, p.6-es, 2006

The old night keeper keeps the keep in the town1

In the big old house in the big old gown.2

The house in the town had the big old keep3

Where the old night keeper never did sleep.4

The night keeper keeps the keep in the night5

And keeps in the dark and sleeps in the light.6

Full-text Search 101:The inverted index

Page 9: Practical Elasticsearch - real world use cases

Full-text Search 101:The inverted index

DocumentsTerm

<6>and

<2> <3>big

<6>dark

<4>did

<2>gown

<3>had

<2> <3>house

<1> <2> <3> <5> <6>in

<1> <3> <5>keep

<1> <4> <5>keeper

<1> <5> <6>keeps

<6>light

<4>never

<1> <4> <5>night

<1> <2> <3> <4>old

<4>sleep

<6>sleeps

<1> <2> <3> <4> <5> <6>the

<1> <3>town

<4>where

The index:

Dictionary and

posting lists

6 documents to index

The old night keeper keeps the keep in the town1

In the big old house in the big old gown.2

The house in the town had the big old keep3

Where the old night keeper never did sleep.4

The night keeper keeps the keep in the night5

And keeps in the dark and sleeps in the light.6

User queries for “keeper”

Page 10: Practical Elasticsearch - real world use cases

Term NormalizationDocumentsTerm

<6>and

<2> <3>big

<6>dark

<4>did

<2>gown

<3>had

<2> <3>house

<1> <2> <3> <5> <6>in

<1> <3> <5>keep

<1> <4> <5>keeper

<1> <5> <6>keeps

<6>light

<4>never

<1> <4> <5>night

<1> <2> <3> <4>old

<4>sleep

<6>sleeps

<1> <2> <3> <4> <5> <6>the

<1> <3>town

<4>where

• Lowercasing

• Stop words (grey)

• Not best practice anymore

• Stemming

• Porter stemmer

• s-stemmer

• Relevance++

• SizeOnDisk--

Page 11: Practical Elasticsearch - real world use cases

Full-Text Search

Your data store

Page 12: Practical Elasticsearch - real world use cases

How hard is it to get search right, anyway?

Page 13: Practical Elasticsearch - real world use cases

Relevance

• PrecisionThe fraction of the retrieved documents that are relevant

• RecallThe fraction of the relevant documents that are retrieved

• Order of results

Page 14: Practical Elasticsearch - real world use cases

Challenges with search

• Relevance

• Getting the tokens right

– Tokenization

– Stemming

• Multi-lingual content

– Or other cross-cutting search concerns

• Tolerance

Page 15: Practical Elasticsearch - real world use cases

Real-time Analytics

Page 16: Practical Elasticsearch - real world use cases

Real-time Analytics

Queue(Redis)

“Shippers”

“Indexer”

Page 17: Practical Elasticsearch - real world use cases

Scaling out

Page 18: Practical Elasticsearch - real world use cases

Moar use cases!

Page 19: Practical Elasticsearch - real world use cases

#1: Real-Time Alerting System

Page 20: Practical Elasticsearch - real world use cases

Percolation

Page 21: Practical Elasticsearch - real world use cases

#2: Smarter query parsing

Page 22: Practical Elasticsearch - real world use cases

Matching inexact queries

• Phrase slop

– “Bridge of London” -> “London Bridge”

• Word-level edit distance with fuzzy queries

– ditsance -> distance

– color -> colour

Page 23: Practical Elasticsearch - real world use cases

#3: Offline Classification

Page 24: Practical Elasticsearch - real world use cases

Structuring the unstructured

• Record linkage

– Bag of words model

– “More Like This” functionality

• NLP

• Entity extraction

Page 25: Practical Elasticsearch - real world use cases

#4: Everything is searchable

Page 26: Practical Elasticsearch - real world use cases

Geo-spatial search

• Distance

• Shape interactions

• Multiple algorithms

Page 27: Practical Elasticsearch - real world use cases

Geo-spatial search

Page 28: Practical Elasticsearch - real world use cases
Page 29: Practical Elasticsearch - real world use cases

Image search

http://colors.qbox.io/

Page 30: Practical Elasticsearch - real world use cases

http://cs.stanford.edu/people/karpathy/deepimagesent

Deep Visual-Semantic Alignments for Generating Image Descriptions

Page 31: Practical Elasticsearch - real world use cases

#5: Anomaly detection

Page 32: Practical Elasticsearch - real world use cases

The Significant Terms Aggregation

Page 33: Practical Elasticsearch - real world use cases

Uncommonly common

Mark Harwood’s talk at

http://www.infoq.com/presentations/elasticsearch-revealing-uncommonly-common

Page 34: Practical Elasticsearch - real world use cases

#6: Debugging a distributed system

Queue(Redis)

Page 35: Practical Elasticsearch - real world use cases

#6: Debugging a distributed system

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gifHTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

System.NullReferenceException: Object reference not set to an instance of an object. at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add) at AjaxControlToolkit.ToolkitScriptManager.GetScriptCombineAttributes(Assembly assembly) at AjaxControlToolkit.ToolkitScriptManager.IsScriptCombinable(ScriptEntry scriptEntry) at AjaxControlToolkit.ToolkitScriptManager.OnResolveScriptReference(ScriptReferenceEventArgs e) at System.Web.UI.ScriptManager.RegisterScripts() at System.Web.UI.ScriptManager.OnPagePreRenderComplete(Object sender, EventArgs e) at System.Web.UI.Page.OnPreRenderComplete(EventArgs e) at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

Page 36: Practical Elasticsearch - real world use cases

#7: Distributed git storage

• PoC in C# using libgit2sharp

• https://github.com/synhershko/libgit2sharp.Elasticsearch

• Kudos @nulltoken

Page 37: Practical Elasticsearch - real world use cases

Putting this to practice

• Search on your data

– Data doesn’t have to be structured to be queried

• Use your logs to gain insight

– Metrics

– Establish a baseline

– Investigate on unexpected / unfamiliar behaviors

Page 38: Practical Elasticsearch - real world use cases

Thank you.Questions?

Itamar Syn-Hershko

http://code972.com

@synhershko