apache solr in aem 6
DESCRIPTION
Introduction to Apache SOLR and configuring Apache SOLR with AEM 6TRANSCRIPT
Introduction to Apache SOLR in Adobe AEM 6
Dr. Yash Mody, PhD CTO | Tekno Point Consulting
About Me
Adobe AEM, Apache Hadoop Instructor & Consultant
Application Architecture and Design Consultant Need I say more?
www.teknopoint.us
www.teknopoint.us
Information Retrieval Document Term Inverted Index Term Frequency (tf) Skip Pointers Positional Index Collection Frequency Document Frequency (df) Inverse Frequency Idf = Log10(N/df) Term Frequency Inverse Document Frequency
tf-idf = tf * Idf
www.teknopoint.us
More???
PHEW! No Way
www.teknopoint.us
Apache SOLR
Fire Powered Lucene Distributed Replicated Remote
And just for the record its… SEARCH On LUCENE w/REPLICATION (TBHPHB)
www.teknopoint.us
Installation
Unpack SOLR distribution Add solr.war to webapps Add –Dsolr.solr.home = … OR http://bitnami.com/stack/solr
www.teknopoint.us
Getting solr ready
Starting SOLR cd /usr/local/Cellar/solr/4.7.2/libexec/example/ - jetty java -jar start.jar http://localhost:8983/solr/#/ Adding content using
www.teknopoint.us
Index and search
Indexing Data java -jar post.jar solr.xml
Searching
http://localhost:8983/solr/select?q=solr&wt=json
www.teknopoint.us
Configurations
Configurations are done in 2 xml files schema.xml – SOLR index configurations solrconfig.xml – SOLR configurations
www.teknopoint.us
Indexing
Indexing is using HTTP POST. So indexed can be posted to SOLR via a web request Data can be pulled using Data Import Handler (uses HTTP GET or DB) SOLR can index binary content (textual + metadata) from docs, video, mp3, images and other binary content
www.teknopoint.us
Search
Search features: Paging, Filtering, Sorting, Faceting
Results: xml (Default), json, php, ruby, python etc. Query Parser: used to interpret queries. 2 types of query parsers
Lucene Query Syntax Parser DisMax Parser (Disjunction Max)
www.teknopoint.us
Solr integration approaches
Crawl using an external crawler like Nutch or Heritrix CQ servlets to serialize content into a Solr (JSON/XML) JCR Observer for page modifications to trigger indexing to Solr.
www.teknopoint.us
AEM 6
2 Types In Built Remote (For distributed) Zookeeper (for setting up a cluster)
Shard – horizontal Partition Replication – no of copies of the index files
www.teknopoint.us
SOLR things we didn’t see
https://github.com/evolvingweb/ajax-solr http://wiki.apache.org/solr/SolrQuerySyntax
www.teknopoint.us
Thanks
@yash_mody http://www.linkedin.com/in/modyyash