08. elasticsearch : sorting and relevance

12
ElasticSearch Sorting and Relevance http://elastic.openthinklabs.com/

Upload: openthink-labs

Post on 22-Jan-2018

322 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: 08.  ElasticSearch : Sorting and Relevance

ElasticSearch

Sorting and Relevancehttp://elastic.openthinklabs.com/

Page 2: 08.  ElasticSearch : Sorting and Relevance

SortingGET /_search{ "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 } } } }}

GET /_search{ "query" : { "constant_score" : { "filter" : { "term" : { "user_id" : 1 } } } }}

Page 3: 08.  ElasticSearch : Sorting and Relevance

SortingSorting by Field Values

GET /_search{ "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 }} } }, "sort": { "date": { "order": "desc" }}}

"hits" : { "total" : 6, "max_score" : null, "hits" : [ { "_index" : "us", "_type" : "tweet", "_id" : "14", "_score" : null, "_source" : { "date": "2014-09-24", ... }, "sort" : [ 1411516800000 ] }, ...}

Page 4: 08.  ElasticSearch : Sorting and Relevance

SortingMultilevel Sorting

GET /_search{ "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ]}

Page 5: 08.  ElasticSearch : Sorting and Relevance

SortingMultilevel Sorting

GET /_search{ "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ]}

Page 6: 08.  ElasticSearch : Sorting and Relevance

SortingSorting on Multivalue Fields

"sort": { "dates": { "order": "asc", "mode": "min" }}

Page 7: 08.  ElasticSearch : Sorting and Relevance

String Sorting and Multifields

"tweet": { "type": "string", "analyzer": "english"}

"tweet": { "type": "string", "analyzer": "english", "fields": { "raw": { "type": "string", "index": "not_analyzed" } }}

GET /_search{ "query": { "match": { "tweet": "elasticsearch" } }, "sort": "tweet.raw"}

Page 8: 08.  ElasticSearch : Sorting and Relevance

What Is Relevance?

● The standard similarity algorithm used in Elasticsearch : ● Term frequency : How often does the term appear in the

field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.

● Inverse document frequency : How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms.

● Field-length norm : How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field

Page 9: 08.  ElasticSearch : Sorting and Relevance

What Is Relevance?Understanding the Score

GET /_search?explain { "query" : { "match" : { "tweet" : "honeymoon" }}}

Page 10: 08.  ElasticSearch : Sorting and Relevance

What Is Relevance?Understanding Why a Document Matched

GET /us/tweet/12/_explain{ "query" : { "bool" : { "filter" : { "term" : { "user_id" : 2 }}, "must" : { "match" : { "tweet" : "honeymoon" }} } }}

"failure to match filter: cache(user_id:[2 TO 2])"

Page 11: 08.  ElasticSearch : Sorting and Relevance

Doc Values Intro

● Doc values are used in several places in Elasticsearch:● Sorting on a field● Aggregations on a field● Certain filters (for example, geolocation filters)● Scripts that refer to fields