flexible search in apache jackrabbit oak

17
Flexible search in Apache Jackrabbit Oak Tommaso Teofili

Upload: tommaso-teofili

Post on 02-Jul-2015

758 views

Category:

Technology


1 download

DESCRIPTION

ApacheCon EU 2014 presentation about the flexible architecture for search in Apache Jackrabbit Oak.

TRANSCRIPT

Page 1: Flexible search in Apache Jackrabbit Oak

Flexible search in Apache Jackrabbit Oak

Tommaso Teofili

Page 2: Flexible search in Apache Jackrabbit Oak

Apache Jackrabbit Oak

•  Scalable content repository •  JCR 2.0 •  Designed for concurrent access (MVCC) •  Pluggable components (storage, indexes) •  Powering AEM 6.0

18/11/14   2  

Page 3: Flexible search in Apache Jackrabbit Oak

Oak Architecture

•  Oak-JCR •  Oak-Core – MVCC (node states and immutable trees) – Core components (Security, Query engine, …) – Plugins

•  Oak-MK – Pluggable storage

18/11/14   3  

Page 4: Flexible search in Apache Jackrabbit Oak

Oak – the Query Engine

•  Query languages – XPATH – SQL-2

•  Selects the index(es) supposed to perform better – Search is demanded to the underlying indexes – No index? The repository is traversed

•  ACLs applied afterwards

18/11/14   4  

Page 5: Flexible search in Apache Jackrabbit Oak

Indexing – the IndexEditor API

•  NodeState before = builder.getNodeState(); •  builder.child(”a").setProperty(”foo", ”bar"); •  NodeState after = builder.getNodeState(); •  NodeState indexed = editorHook.processCommit(before, after, …); // who said MVCC?

18/11/14   5  

Page 6: Flexible search in Apache Jackrabbit Oak

Searching – the QueryIndex API

•  Filter filter = … ; // "select * from [nt:folder]" •  filter.restrictPath("/somenode",

Filter.PathRestriction.DIRECT_CHILDREN); •  Cursor cursor = queryIndex.query(filter,

nodeState); // search against a state •  IndexRow row = cursor.next(); // results

18/11/14   6  

Page 7: Flexible search in Apache Jackrabbit Oak

Searching – Filters

•  Full text expressions •  Property restrictions •  Path restrictions – Exact – Parent – Child – Descendant

•  Node type restrictions

18/11/14   7  

Page 8: Flexible search in Apache Jackrabbit Oak

Configuring indexes

•  Indexes are declared by adding “query index configuration” nodes in the repository – Type – Asynchronous – Reindex –  Index specific properties

18/11/14   8  

Page 9: Flexible search in Apache Jackrabbit Oak

In repository indexes

•  Data structures designed as content – Property index – Ordered property index – Node type index – Reference index

18/11/14   9  

Page 10: Flexible search in Apache Jackrabbit Oak

Lucene index

•  Full text and (sorted) property restrictions •  Stored in repository •  Tika for indexing binaries •  Configurable indexing rules (boost), codec,

analyzers

19/11/14   10  

Page 11: Flexible search in Apache Jackrabbit Oak

Lucene index

•  Interesting facts – DocValues for sorted property restrictions – Uncompressed stored fields – Property exists queries •  TermRange vs Wildcard vs Term vs MatchAll

+FieldExistsFilter

19/11/14   11  

Page 12: Flexible search in Apache Jackrabbit Oak

Solr index

•  Full text, property, path restrictions •  Embedded or remote Solr(Cloud) •  Configurable – Mapping restriction / fields – Page size – Commit policy

•  Most is configured on the Solr side

18/11/14   12  

Page 13: Flexible search in Apache Jackrabbit Oak

Problems

•  Hard to express complex queries •  Cannot leverage underlying indexes

advanced capabilities

18/11/14   13  

Page 14: Flexible search in Apache Jackrabbit Oak

Native language support

•  Leverage underlying index capabilities – Multiple query languages/parsers

•  More accurate full text queries (and results) – … where native(’lucene', 'name:(hello world)

“hello world”^3') •  Advanced index capabilities (e.g. MLT) – … where native('solr', 'mlt?q=path:/content/

sample1&mlt.fl=jcr:title') 19/11/14   14  

Page 15: Flexible search in Apache Jackrabbit Oak

Adding more indexes

•  Create an IndexEditor – Turn diff into an “indexable”

•  Create a QueryIndex – Turn a Filter into an index-specific query

•  “Declare” the index

18/11/14   15  

Page 16: Flexible search in Apache Jackrabbit Oak

Looking forward

•  Results aggregation features (e.g. facets) •  More configuration options (Lucene, Solr) •  Smarter index selection •  Cover indexes

18/11/14   16  

Page 17: Flexible search in Apache Jackrabbit Oak

Thanks