Harnessing ThePower of SearchAndré Ricardo Barreto de Oliveira ("Arbo")Software Engineer - Team Lead - Search
Darmstadt, Germany7 October, 2015
What's Searchand why is it so cool?
The dawn of Search
Searching higher
Search and the
Digital Experience
Understanding Search
Inside the Search Engine
The Index
Inside the Search Engine
The Index Documents
Inside the Search Engine
The Index Documents Fields
Inside the Search Engine
The Index Documents Fields
Not that different from ye olde database?...
Indexing documents
PUT /megacorp/employee/1{ "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]}
PUT /megacorp/employee/2{ "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ]}
PUT /megacorp/employee/3{ "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ]}
Queries and Filters
GET /megacorp/employee/_search?q=last_name:Smith "hits": [ { "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ]
GET /megacorp/employee/_search{ "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 21 } } }, "query" : { "match" : { "last_name" : "smith" } } } }}
Full-Text Search
GET /megacorp/employee/_search{ "query" : { "match" : {
"about" : "rock climbing" } }}
"hits": [ {
"_score": 0.16273327, "_source": { "first_name": "John", "last_name": "Smith", "age": 25,
"about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, {
"_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32,
"about": "I like to collect rock albums", "interests": [ "music" ] } } ]
Analysis and Analyzers
Set the shape to semi-transparent by calling Set_Trans(5)
Standard analyzer
set, the, shape, to, semi, transparent, by, calling, set_trans, 5
Simple analyzer
set, the, shape, to, semi, transparent, by, calling, set, trans
Whitespace analyzer
Set, the, shape, to, semi-transparent, by, calling, Set_Trans(5)
English language analyzer
set, shape, semi, transpar, call, set_tran, 5
Field mappings
{ "number_of_clicks": {
"type": "integer" }}
{ "tag": { "type": "string",
"index": "not_analyzed" }}
{ "tweet": { "type": "string",
"analyzer": "english" }}
Analytics and Aggregations
GET /megacorp/employee/_search{ "query": { "match": { "last_name": "smith" } }, "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } }}
"buckets": [
{
"key": "music",
"doc_count": 2,
"avg_age": {
"value": 28.5
}
},
{
"key": "sports",
"doc_count": 1,
"avg_age": {
"value": 25
}
}
]
The LiferaySearch Infrastructure
The Liferay Search architecture
Liferay Portal
Assets:web content,
message boards, wiki pages...
Search infrastructure
(Magic happens
here)
Search engine(s)
Indices, documents, analysis...
The Liferay Search Engine plugins
public interface SearchEngine {
public IndexSearcher getIndexSearcher();
public IndexWriter getIndexWriter();
}
public class ElasticsearchSearchEngineextends BaseSearchEngine
public class ElasticsearchIndexSearcherextends BaseIndexSearcher
public class ElasticsearchIndexWriterextends BaseIndexWriter
public class SolrSearchEngineextends BaseSearchEngine
public class SolrIndexSearcherextends BaseIndexSearcher
public class SolrIndexWriterextends BaseIndexWriter
Solr: schema.xml
<fields>
<field indexed="true"
name="articleId"
stored="true"
type="string_keyword_lowercase"
/>
<field indexed="true"
name="companyId"
stored="true"
type="long"
/>
<field indexed="true"
name="emailAddress"
stored="true"
type="string"
/>
</fields>
The Liferay Document Mappings
Elasticsearch: liferay-type-mappings.json
"LiferayDocumentType": {
"properties": {
"articleId": {
"analyzer": "keyword_lowercase",
"store": "yes",
"type": "string"
},
"companyId": {
"index": "not_analyzed",
"store": "yes",
"type": "string"
},
"emailAddress": {
"index": "not_analyzed",
"store": "yes",
"type": "string"
}
}
}
From Portal assets to Index documents…
public interface Indexer<T> {
public Document getDocument(T object);
}
public class JournalArticleIndexer extends BaseIndexer<JournalArticle> {
protected Document doGetDocument(JournalArticle journalArticle) {
Document document = getBaseModelDocument(CLASS_NAME, journalArticle);
document.addText(
LocalizationUtil.getLocalizedName(Field.CONTENT, languageId),
content);
document.addKeyword(
Field.VERSION, journalArticle.getVersion());
document.addDate(
"displayDate", journalArticle.getDisplayDate());
}
}
public class MBMessageIndexer extends BaseIndexer<MBMessage> {
protected Document doGetDocument(MBMessage mbMessage) {
Document document = getBaseModelDocument(CLASS_NAME, mbMessage);
document.addText(
Field.CONTENT, processContent(mbMessage));
document.addKeyword(
"discussion", discussion == null ? false : true);
if (mbMessage.isAnonymous()) {
document.remove(Field.USER_NAME);
}
}
}
public interface Document {
public void addKeyword(String name, String value);public void addNumber(String name, long value);
}
… from Search Box to queries and filters
public class JournalArticleIndexer
extends BaseIndexer<JournalArticle> {
public void postProcessSearchQuery(
BooleanQuery searchQuery,
BooleanFilter fullQueryBooleanFilter,
SearchContext searchContext) {
addSearchTerm(searchQuery, searchContext,
Field.ARTICLE_ID, false);
addSearchLocalizedTerm(searchQuery, searchContext,
Field.CONTENT, false);
addSearchLocalizedTerm(searchQuery, searchContext,
Field.TITLE, false);
addSearchTerm(searchQuery, searchContext,
Field.USER_NAME, false);
}
}
public class MBThreadIndexer
extends BaseIndexer<MBThread> {
public void postProcessContextBooleanFilter(
BooleanFilter contextBooleanFilter,
SearchContext searchContext) {
contextBooleanFilter.addRequiredTerm(
"discussion", discussion);
if ((endDate > 0) && (startDate > 0)) {
contextBooleanFilter.addRangeTerm(
"lastPostDate", startDate, endDate);
}
}
}
Classic query types (and filters)
TermQuery / TermFilter
"term" : { "locale" : "de_DE" }
TermRangeQuery / RangeTermFilter
"range" : { "age" : { "gte" : 8, "lte" : 42 } }
WildcardQuery
"wildcard" : { "company" : "L*ray" }
StringQuery
"query_string": { "query": "(content:this OR name:this) AND (content:that OR name:that)" }
BooleanQuery / BooleanFilter
"bool" : { "must" : { "term" : { "locale" : "de_DE" } }, "must_not" : { "range" : { "age" : { "from" : 8, "to" : 42 } } }, "should" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] }
Speaking to the Search Engine
public interface Query {
public BooleanFilter getPreBooleanFilter();
public Filter getPostFilter();
}
public interface Filter {
public Boolean isCached();
}
public class StringQueryTranslatorImpl implements StringQueryTranslator {
public QueryBuilder translate(StringQuery stringQuery) {
// Elasticsearch Client Java API
return QueryBuilders.queryStringQuery(stringQuery.getQuery());
}}
public class ElasticsearchIndexSearcher extends BaseIndexSearcher {
protected SearchResponse doSearch(
SearchContext searchContext, Query query) {
// Elasticsearch Client Java API
Client client = _elasticsearchConnectionManager.getClient();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(
getSelectedIndexNames(queryConfig, searchContext));
QueryBuilder queryBuilder = _queryTranslator.translate(
query, searchContext);
searchRequestBuilder.setQuery(queryBuilder);
SearchResponse searchResponse = searchRequestBuilder.get();
return searchResponse;
}}
Search in Liferay 7
What's new in Liferay 7
Liferay 6
● Embedded Lucene by default
● Remote: Solr only
● Solr 4
● Portal-centric Lucene clustering
Liferay 7
● Embedded Elasticsearch by default
● Remote: Elasticsearch and Solr
● Solr 5.x and SolrCloud
● Native, transparent Elasticsearch clustering
● Queries + Filters + Boosting + Geolocation
● Extensibility and modularization
● Enterprise extras
○ Shield for security
○ Marvel for cluster monitoring
○ Kibana for visualization
New Queries
MatchQuery
"match" : { "subject" : { "query" : "Liferay Portal", "type" : "phrase" }}
MoreLikeThisQuery
"more_like_this" : {"fields" : ["title", "content"],"like_text" : "Search In Liferay 7","min_term_freq" : 1, "max_query_terms" : 12
}
DisMaxQuery
"dis_max" : {"tie_breaker" : 0.7,"queries" : [
{ "term" : { "age" : 34 } },{ "term" : { "age" : 35 } }
]}
FuzzyQuery
"fuzzy" : { "user" : { "value" : "ed", "fuzziness" : 2, "max_expansions": 100 }}
MatchAllQuery / MatchAllFilter
"match_all" : { "boost" : 1.2
}
MultiMatchQuery
"multi_match" : { "query": "Enterprise. Open Source. For Life", "type": "most_fields", "fields": [ "title", "title.original", "title.shingles" ]}
New Filters
ExistsFilter
"exists" : { "field" : "emailAddress" }
MissingFilter
"missing" : { "field" : "emailAddress" }
PrefixFilter
"prefix" : { "product" : "life" }
TermsFilter
"terms" : { "locale" : ["de_DE", "pt_BR", "en_CA"] }
QueryFilter
"fquery" : { "query" : { "bool" : { "must" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] } }, "_cache" : true}
Geolocation filters
GeoDistanceFilter
"geo_distance" : { "distance" : "12km", "pin.location" : { "lat" : 40, "lon" : -70 }}
GeoBoundingBoxFilter
"geo_bounding_box" : { "pin.location" : { "top_left" : { "lat" : 40.73, "lon" : -74.1 }, "bottom_right" : { "lat" : 40.01, "lon" : -71.12 } }}
GeoDistanceRangeFilter
"geo_distance_range" : { "from" : "200km", "to" : "400km", "pin.location" : { "lat" : 40, "lon" : -70 }}
GeoPolygonFilter
"geo_polygon" : { "person.location" : { "points" : [ [-70, 40], [-80, 30], [-90, 20] ] }}
Query-time boosting
"should": [ { "match": { "title": { "query": "Liferay Portal", "boost": 2 } } }, { "match": { "content": { "query": "Liferay Portal", } } } ]
New Aggregations: Top Hits
"terms": { "field": "conference", "size": 2},"aggs": { "talks": { "top_hits": { "size" : 1, "sort": [ { "attendees": { "order": "desc" } } ] } }}
{ "key": "Liferay DEVCON", "talks": { "hits": [ { "_source": { "title": "The Power of Search" } } ] } }, { "key": "Liferay North America Symposium", "talks": { "hits": [ { "_source": { "title": "The ELK Stack" } } ] } }
New Aggregations: Extended Stats
"extended_stats" : { "field" : "attendees"
}
"attendees_per_talk_stats": { "count": 9, "min": 72, "max": 99, "avg": 86, "sum": 774, "sum_of_squares": 67028, "std_deviation": 7.180219742846005 }
Modularity and Search
● OSGi● Liferay's default Search Engine: now a plugin in itself● Extension points in Search
○ Node Settings contributors → fine tune your cluster○ Index Settings contributors → fine tune your shards and
logs○ Analyzers and Mappings contributors → fine tune your
fields and queries
Liferay 7:Enter Elasticsearch
Why Elasticsearch?
Best of breed
Built for modern web applications
Distributed and clusterable by design
Lucene based
Multi-tenancy
Great vendor support
Great monitoring tools: Marvel, Logstash
Great for Developers
Open Source
Amazing documentation
High "just works" factor, e.g. zero-config indexing and clustering
REST for queries, health, admin - everything
Update live settings programmatically
Great Java Client API
Pretty JSON for talks ;-)
Clustering with Liferay and Elasticsearch
Production mode
Dev mode
Scaling and tuning made easy
Enterprise-level Searchin Liferay 7 EE
Security: Shield
Protect your Liferay index with a username and password
SSL/TLS encryption for traffic within the Liferay Elasticsearch cluster
Elasticsearch plugin - no need for an external security solution
Restrict access to Liferay Portal instances with IP filtering
Monitoring: Marvel
Visualization:
Kibana
Thanks and happy searching!http://j.mp/[email protected]/arboliveira@arbocombr