elastic search: beyond ordinary fulltext search (webexpo 2011 prague)
DESCRIPTION
Talk at the Webexpo 2001 Conference in Prague (http://webexpo.net/)TRANSCRIPT
![Page 1: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/1.jpg)
ElasticSearchBeyond Ordinary Fulltext Search
Karel Minařík
![Page 3: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/3.jpg)
ElasticSearch
AUDIENCE POLL
Does your application have a search feature?
![Page 4: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/4.jpg)
ElasticSearch
AUDIENCE POLL
What do you use for search?
1. SELECT ... LIKE %foo%
2. Sphinx3. Apache Solr4. ElasticSearch
![Page 5: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/5.jpg)
ElasticSearch
Search is the primary interfacefor getting information today.
![Page 6: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/6.jpg)
![Page 7: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/7.jpg)
http://www.apple.com/macosx/what-is-macosx/spotlight.html
![Page 8: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/8.jpg)
http://www.apple.com/iphone/features/search.html
![Page 9: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/9.jpg)
???
![Page 10: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/10.jpg)
???
![Page 11: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/11.jpg)
![Page 12: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/12.jpg)
#uxfail???
![Page 13: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/13.jpg)
Y U NO ALIGN???
![Page 14: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/14.jpg)
![Page 15: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/15.jpg)
???
![Page 16: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/16.jpg)
???
![Page 17: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/17.jpg)
![Page 18: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/18.jpg)
ElasticSearch
Search is hard.Let's go write SQL queries!
![Page 19: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/19.jpg)
How do you implement search?WHY SEARCH SUCKS?
def search @results = MyModel.search params[:q] respond_with @resultsend
![Page 20: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/20.jpg)
def search @results = MyModel.search params[:q] respond_with @resultsend
How do you implement search?WHY SEARCH SUCKS?
ResultResultsQuery
MAGIC
![Page 21: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/21.jpg)
def search @results = MyModel.search params[:q] respond_with @resultsend
ResultResultsQuery
How do you implement search?WHY SEARCH SUCKS?
MAGIC + /
![Page 22: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/22.jpg)
A personal story...
670px
23px
![Page 23: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/23.jpg)
MyModel.search "(this OR that) AND NOT whatever"
Arel::Table.new(:articles). where(articles[:title].eq('On Search')). where(["published_on => ?", Time.now]). join(comments). on(article[:id].eq(comments[:article_id])) take(5). skip(4). to_sql
Compare your search library with your ORM libraryWHY SEARCH SUCKS?
![Page 24: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/24.jpg)
ElasticSearch
How does search work?
![Page 25: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/25.jpg)
A collection of documentsHOW DOES SEARCH WORK?
file_1.txtThe ruby is a pink to blood-‐red colored gemstone ...
file_2.txtRuby is a dynamic, reflective, general-‐purpose object-‐oriented programming language ...
file_3.txt"Ruby" is a song by English rock band Kaiser Chiefs ...
![Page 26: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/26.jpg)
How do you search documents?HOW DOES SEARCH WORK?
File.read('file_1.txt').include?('ruby')File.read('file_2.txt').include?('ruby')...
![Page 27: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/27.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
TOKENS POSTINGS
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
![Page 28: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/28.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
MySearchLib.search "ruby"
![Page 29: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/29.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
MySearchLib.search "song"
ruby file_1.txt file_2.txt file_3.txt
![Page 30: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/30.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
english file_3.txt
rock file_3.txt
MySearchLib.search "ruby AND song"
song file_3.txt
![Page 31: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/31.jpg)
module SimpleSearch
def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end
def analyze content # >>> Split content by words into "tokens" content.split(/\W/). # >>> Downcase every word map { |word| word.downcase }. # >>> Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end
def store document_id, tokens tokens.each do |token| # >>> Save the "posting" ( (INDEX[token] ||= []) << document_id ).uniq! end end
def search token puts "Results for token '#{token}':" # >>> Print documents stored in index for this token INDEX[token].each { |document| " * #{document}" } end
INDEX = {} STOPWORDS = %w|a an and are as at but by for if in is it no not of on or that the then there these this to with|
extend self
end
A naïve Ruby implementation
![Page 32: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/32.jpg)
SimpleSearch.index "file1", "Ruby is a language. Java is also a language."SimpleSearch.index "file2", "Ruby is a song."SimpleSearch.index "file3", "Ruby is a stone."SimpleSearch.index "file4", "Java is a language."
Indexed document file1 with tokens:["ruby", "language", "java", "also", "language"]
Indexed document file2 with tokens:["ruby", "song"]
Indexed document file3 with tokens:["ruby", "stone"]
Indexed document file4 with tokens:["java", "language"]
Indexing documentsHOW DOES SEARCH WORK?
Words downcased,stopwords removed.
![Page 33: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/33.jpg)
puts "What's in our index?"
p SimpleSearch::INDEX
{ "ruby" => ["file1", "file2", "file3"], "language" => ["file1", "file4"], "java" => ["file1", "file4"], "also" => ["file1"], "stone" => ["file3"], "song" => ["file2"]}
The indexHOW DOES SEARCH WORK?
![Page 34: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/34.jpg)
SimpleSearch.search "ruby"
Results for token 'ruby':* file1* file2* file3
Search the indexHOW DOES SEARCH WORK?
![Page 35: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/35.jpg)
The inverted indexHOW DOES SEARCH WORK?
http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
TOKENS POSTINGS
ruby file_1.txt file_2.txt file_3.txt
pink file_1.txt
gemstone file_1.txt
dynamic file_2.txt
reflective file_2.txt
programming file_2.txt
song file_3.txt
english file_3.txt
rock file_3.txt
31
![Page 36: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/36.jpg)
ElasticSearch
It is very practical to know how search works.
For instance, now you know thatthe analysis step is very important.
It's more important than the “search” step.
![Page 37: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/37.jpg)
module SimpleSearch
def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end
def analyze content # >>> Split content by words into "tokens" content.split(/\W/). # >>> Downcase every word map { |word| word.downcase }. # >>> Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end
def store document_id, tokens tokens.each do |token| # >>> Save the "posting" ( (INDEX[token] ||= []) << document_id ).uniq! end end
def search token puts "Results for token '#{token}':" # >>> Print documents stored in index for this token INDEX[token].each { |document| " * #{document}" } end
INDEX = {} STOPWORDS = %w|a an and are as at but by for if in is it no not of on or that the then there these this to with|
extend self
end A naïve Ruby implementation
![Page 38: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/38.jpg)
http://search-engines-book.com
Search EnginesInformation Retrieval in PracticeBruce Croft, Donald Metzler and Trevor StrohmaAddison Wesley, 2009
The Search Engine TextbookHOW DOES SEARCH WORK?
![Page 39: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/39.jpg)
Lucene in ActionMichael McCandless, Erik Hatcher and Otis GospodneticJuly, 2010
The Baseline Information Retrieval ImplementationSEARCH IMPLEMENTATIONS
http://manning.com/hatcher3
![Page 41: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/41.jpg)
ElasticSearch
ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerfull aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene.
![Page 42: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/42.jpg)
![Page 43: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/43.jpg)
ElasticSearch
HTTPJSONSchema-freeIndex as ResourceDistributedQueriesFacetsMappingRuby
{ }
![Page 44: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/44.jpg)
# Add a documentcurl -‐X POST \
"http://localhost:9200/articles/article/1" \
-‐d '{ "title" : "One" }'
INDEX TYPE ID
ELASTICSEARCH FEATURES
HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
DOCUMENT
![Page 45: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/45.jpg)
ELASTICSEARCH FEATURES
# Add a documentcurl -‐X POST "http://localhost:9200/articles/article/1" -‐d '{ "title" : "One" }'
# Perform querycurl -‐X GET "http://localhost:9200/articles/_search?q=One"curl -‐X POST "http://localhost:9200/articles/_search" -‐d '{ "query" : { "terms" : { "tags" : ["ruby", "python"], "minimum_match" : 2 } }}'
# Delete indexcurl -‐X DELETE "http://localhost:9200/articles"
# Create index with settings and mappingcurl -‐X PUT "http://localhost:9200/articles" -‐d '{ "settings" : { "index" : "number_of_shards" : 3, "number_of_replicas" : 2 }},{ "mappings" : { "document" : { "properties" : { "body" : { "type" : "string", "analyzer" : "snowball" } } } }}'
HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
![Page 46: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/46.jpg)
http {
server {
listen 8080; server_name search.example.com;
error_log elasticsearch-‐errors.log; access_log elasticsearch.log;
location / {
# Deny access to Cluster API if ($request_filename ~ "_cluster") { return 403; break; }
# Pass requests to ElasticSearch proxy_pass http://localhost:9200; proxy_redirect off; proxy_set_header X-‐Real-‐IP $remote_addr; proxy_set_header X-‐Forwarded-‐For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host;
# Authorize access auth_basic "ElasticSearch"; auth_basic_user_file passwords;
# Route all requests to authorized user's own index rewrite ^(.*)$ /$remote_user$1 break; rewrite_log on;
return 403; }
}}
HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
GET http://user:password@localhost:8080/_search?q=* => http://localhost:9200/user/_search?q=*
https://gist.github.com/986390
#664 Add HTTPS and basic authentication support NO.
![Page 47: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/47.jpg)
HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
{ "id" : "abc123",
"title" : "ElasticSearch Understands JSON!",
"body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...",
"published_on" : "2011/05/27 10:00:00", "tags" : ["search", "json"],
"author" : { "first_name" : "Clara", "last_name" : "Rice", "email" : "[email protected]" }}
JSON
![Page 48: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/48.jpg)
curl -‐X DELETE "http://localhost:9200/articles"; sleep 1curl -‐X POST "http://localhost:9200/articles/article" -‐d '
{ "id" : "abc123",
"title" : "ElasticSearch Understands JSON!",
"body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...",
"published_on" : "2011/05/27 10:00:00", "tags" : ["search", "json"],
"author" : { "first_name" : "Clara", "last_name" : "Rice", "email" : "[email protected]" }}'curl -‐X POST "http://localhost:9200/articles/_refresh"
curl -‐X GET \ "http://localhost:9200/articles/article/_search?q=author.first_name:clara"
HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
![Page 49: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/49.jpg)
curl -‐X GET "http://localhost:9200/articles/_mapping?pretty=true"{ "articles" : { "article" : { "properties" : { "title" : { "type" : "string" }, // ... "author" : { "dynamic" : "true", "properties" : { "first_name" : { "type" : "string" }, // ... } }, "published_on" : { "format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd", "type" : "date" } } } }}
HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
curl -‐X POST "http://localhost:9200/articles/article" -‐d '..."published_on" : "2011/05/27 10:00:00",...
![Page 50: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/50.jpg)
curl -‐X POST "http://localhost:9200/articles/comment" -‐d '{ "body" : "Wow! Really nice JSON support.",
"published_on" : "2011/05/27 10:05:00",
"author" : { "first_name" : "John", "last_name" : "Pear", "email" : "[email protected]" }}'curl -‐X POST "http://localhost:9200/articles/_refresh"
curl -‐X GET \ "http://localhost:9200/articles/comment/_search?q=author.first_name:john"
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
DIFFERENT TYPE
![Page 51: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/51.jpg)
curl -‐X GET \ "http://localhost:9200/articles/comment/_search?q=body:json"
curl -‐X GET \ "http://localhost:9200/articles/_search?q=body:json"
curl -‐X GET \ "http://localhost:9200/articles,users/_search?q=body:json"
curl -‐X GET \ "http://localhost:9200/_search?q=body:json"
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Search single type
Search whole index
Search multiple indices
Search all indices
![Page 52: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/52.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
curl -‐X DELETE "http://localhost:9200/articles"; sleep 1
curl -‐X POST "http://localhost:9200/articles/article" -‐d '{ "id" : "abc123",
"title" : "ElasticSearch Understands JSON!",
"body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...",
"published_on" : "2011/05/27 10:00:00", "tags" : ["search", "json"],
"author" : { "first_name" : "Clara", "last_name" : "Rice", "email" : "[email protected]" }}'curl -‐X POST "http://localhost:9200/articles/_refresh"
curl -‐X GET "http://localhost:9200/articles/article/abc123"
![Page 53: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/53.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
{"_index":"articles","_type":"article","_id":"1","_version":1, "_source" : { "id" : "1",
"title" : "ElasticSearch Understands JSON!",
"body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...",
"published_on" : "2011/05/27 10:00:00", "tags" : ["search", "json"],
"author" : { "first_name" : "Clara", "last_name" : "Rice", "email" : "[email protected]" }}}
“The Index Is Your Database”
![Page 54: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/54.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html
my_alias
index_A
index_B
curl -‐X POST 'http://localhost:9200/_aliases' -‐d '{ "actions" : [ { "add" : { "index" : "index_1", "alias" : "myalias" } }, { "add" : { "index" : "index_2", "alias" : "myalias" } } ]}'
Index Aliases
![Page 55: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/55.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
logs
The “Sliding Window” problem
logs_2010_02
logs_2010_03
logs_2010_04
curl -‐X DELETE http://localhost:9200 / logs_2010_01
“We can really store only three months worth of data.”
![Page 56: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/56.jpg)
curl -‐X PUT localhost:9200/_template/bookmarks_template -‐d '{ "template" : "users_*",
"settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 3 } },
"mappings": { "url": { "properties": { "url": { "type": "string", "analyzer": "url_ngram", "boost": 10 }, "title": { "type": "string", "analyzer": "snowball", "boost": 5 } // ... } } }}'
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Apply this configurationfor every matchingindex being created
http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
Index Templates
![Page 57: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/57.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Node 1 Node 2 Node 3 Node 4MASTER
Automatic Discovery Protocol
http://www.elasticsearch.org/guide/reference/modules/discovery/
$ cat elasticsearch.yml
cluster: name: <YOUR APPLICATION>
![Page 58: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/58.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
A
A1
A2
A3
A1'
A2'
A3'
A1''
A2''
A3''
Replicas
Shards
curl -‐XPUT 'http://localhost:9200/A/' -‐d '{ "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 2 } }}'
Index is split into 3 shards, and duplicated in 2 replicas.
![Page 59: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/59.jpg)
Improve indexing performance
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
SHARDS
REPLICAS
Impro
ve search perfo
rmance
![Page 60: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/60.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Y U NO ASK FIRST???
![Page 61: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/61.jpg)
HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Indexing 100 000 documents (~ 56MB), one shard, no replicas, MacBookAir SSD 2GB
# Index all at oncetime curl -‐s -‐X POST "http://localhost:9200/_bulk" \ -‐-‐data-‐binary @data/bulk_all.json > /dev/null
real 2m1.142s
# Index in batches of 1000for file in data/bulk_*.json; do time curl -‐s -‐X POST "http://localhost:9200/_bulk" \ -‐-‐data-‐binary @$file > /dev/nulldone
real 1m36.697s (-‐25sec, 80%)
# Do not refresh during indexing in batches"settings" : { "refresh_interval" : "-‐1" }for file in data/bulk_*.json; do...real 0m38.859s (-‐82sec, 32%)
![Page 62: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/62.jpg)
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Terms appleapple iphone
Phrases "apple iphone"
Proximity "apple safari"~5
Fuzzy apple~0.8
Wildcards app**pp*
Boosting apple^10 safari
Range [2011/05/01 TO 2011/05/31][java TO json]
Boolean
apple AND NOT iphone+apple -‐iphone(apple OR iphone) AND NOT review
Fieldstitle:iphone^15 OR body:iphonepublished_on:[2011/05/01 TO "2011/05/27 10:00:00"]
http://lucene.apache.org/java/3_1_0/queryparsersyntax.html
$ curl -‐X GET "http://localhost:9200/_search?q=<YOUR QUERY>"
![Page 63: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/63.jpg)
curl -‐X POST "http://localhost:9200/articles/_search?pretty=true" -‐d '{ "query" : { "terms" : { "tags" : [ "ruby", "python" ], "minimum_match" : 2 } }}'
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Query DSL
http://www.elasticsearch.org/guide/reference/query-dsl/
JSON
![Page 64: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/64.jpg)
curl -‐X POST "http://localhost:9200/venues/venue" -‐d '{ "name": "Pizzeria", "pin": { "location": { "lat": 50.071712, "lon": 14.386832 } }}'
curl -‐X POST "http://localhost:9200/venues/_search?pretty=true" -‐d '{
"query" : { "filtered" : { "query" : { "query_string" : { "query" : "pizzeria" } }, "filter" : { "geo_distance" : { "distance" : "0.5km", "pin.location" : { "lat" : 50.071481, "lon" : 14.387284 } } } } }}'
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Accepted formats for Geo:
[lon, lat] # Array
"lat,lon" # String
drm3btev3e86 # Geohash
Geo Search
http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html
![Page 65: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/65.jpg)
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
Query
Facets
![Page 66: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/66.jpg)
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
curl -‐X POST "http://localhost:9200/articles/_search?pretty=true" -‐d '{ "query" : { "query_string" : { "query" : "title:T*"} }, "filter" : { "terms" : { "tags" : ["ruby"] } }, "facets" : { "tags" : { "terms" : { "field" : "tags", "size" : 10 } } }}'
# facets" : {# "tags" : {# "terms" : [ {# "term" : "ruby",# "count" : 2# }, {# "term" : "python",# "count" : 1# }, {# "term" : "java",# "count" : 1# } ]# }# }
User query
“Checkboxes”
Facets
http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
![Page 67: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/67.jpg)
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
curl -‐X POST "http://localhost:9200/articles/_search?pretty=true" -‐d '{ "facets" : { "published_on" : { "date_histogram" : { "field" : "published", "interval" : "day" } } }}'
![Page 68: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/68.jpg)
Geo Facetscurl -‐X POST "http://localhost:9200/venues/_search?pretty=true" -‐d '{ "query" : { "query_string" : { "query" : "pizzeria" } }, "facets" : { "distance_count" : { "geo_distance" : { "pin.location" : { "lat" : 50.071712, "lon" : 14.386832 }, "ranges" : [ { "to" : 1 }, { "from" : 1, "to" : 5 }, { "from" : 5, "to" : 10 } ] } } }}'
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
http://www.elasticsearch.org/guide/reference/api/search/facets/geo-distance-facet.html
![Page 69: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/69.jpg)
def analyze content # >>> Split content by words into "tokens" content.split(/\W/). # >>> Downcase every word map { |word| word.downcase }. # ... end
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
Remember?
curl -‐X DELETE "http://localhost:9200/articles"curl -‐X POST "http://localhost:9200/articles/article" -‐d '{ "mappings": { "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "content": { "type": "string", "analyzer": "snowball" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 } } } }}'
curl -‐X GET 'http://localhost:9200/articles/_mapping?pretty=true'
http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html
![Page 70: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/70.jpg)
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby
ELASTICSEARCH FEATURES
curl -‐X DELETE "http://localhost:9200/urls"curl -‐X POST "http://localhost:9200/urls/url" -‐d '{ "settings" : { "index" : { "analysis" : { "analyzer" : { "url_analyzer" : { "type" : "custom", "tokenizer" : "lowercase", "filter" : ["stop", "url_stop", "url_ngram"] } }, "filter" : { "url_stop" : { "type" : "stop", "stopwords" : ["http", "https", "www"] }, "url_ngram" : { "type" : "nGram", "min_gram" : 3, "max_gram" : 5 } } } } }}'
https://gist.github.com/988923
![Page 71: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/71.jpg)
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyELASTICSEARCH FEATURES
Tire.index 'articles' do delete create
store :title => 'One', :tags => ['ruby'], :published_on => '2011-‐01-‐01' store :title => 'Two', :tags => ['ruby', 'python'], :published_on => '2011-‐01-‐02' store :title => 'Three', :tags => ['java'], :published_on => '2011-‐01-‐02' store :title => 'Four', :tags => ['ruby', 'php'], :published_on => '2011-‐01-‐03'
refreshend
s = Tire.search 'articles' do query { string 'title:T*' }
filter :terms, :tags => ['ruby']
sort { title 'desc' }
facet 'global-‐tags' { terms :tags, :global => true }
facet 'current-‐tags' { terms :tags }end
http://github.com/karmi/tire
![Page 72: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/72.jpg)
class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacksend
http://github.com/karmi/tire
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyELASTICSEARCH FEATURES
Article.search do query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { published_on 'desc' }end
$ rake environment tire:import CLASS='Article'
![Page 73: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/73.jpg)
class Article include Whatever::ORM
include Tire::Model::Search include Tire::Model::Callbacksend
http://github.com/karmi/tire
HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyELASTICSEARCH FEATURES
Article.search do query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { published_on 'desc' }end
$ rake environment tire:import CLASS='Article'
![Page 74: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/74.jpg)
$ rails new tired -‐m "https://gist.github.com/raw/951343/tired.rb"
A “batteries included” installation.Downloads and launches ElasticSearch.Sets up a Rails applicationand and launches it.When you're tired of it, just delete the folder.
Try ElasticSearch in a Ruby On Rails aplication with a one-line command
![Page 75: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)](https://reader034.vdocument.in/reader034/viewer/2022052301/554f7821b4c9052a518b487e/html5/thumbnails/75.jpg)
Thanks!d