apache solr in aem 6

Post on 26-Jan-2015

137 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to Apache SOLR and configuring Apache SOLR with AEM 6

TRANSCRIPT

Introduction to Apache SOLR in Adobe AEM 6

Dr. Yash Mody, PhD CTO | Tekno Point Consulting

About Me

Adobe AEM, Apache Hadoop Instructor & Consultant

Application Architecture and Design Consultant Need I say more?

www.teknopoint.us  

www.teknopoint.us  

Information Retrieval Document Term Inverted Index Term Frequency (tf) Skip Pointers Positional Index Collection Frequency Document Frequency (df) Inverse Frequency Idf = Log10(N/df) Term Frequency Inverse Document Frequency

tf-idf = tf * Idf

www.teknopoint.us  

More???

PHEW! No Way

www.teknopoint.us  

Apache SOLR

Fire Powered Lucene Distributed Replicated Remote

And just for the record its… SEARCH On LUCENE w/REPLICATION (TBHPHB)

www.teknopoint.us  

Installation

Unpack SOLR distribution Add solr.war to webapps Add –Dsolr.solr.home = … OR http://bitnami.com/stack/solr

www.teknopoint.us  

Getting solr ready

Starting SOLR cd /usr/local/Cellar/solr/4.7.2/libexec/example/ - jetty java -jar start.jar http://localhost:8983/solr/#/ Adding content using

www.teknopoint.us  

Index and search

Indexing Data java -jar post.jar solr.xml

Searching

http://localhost:8983/solr/select?q=solr&wt=json

www.teknopoint.us  

Configurations

Configurations are done in 2 xml files schema.xml – SOLR index configurations solrconfig.xml – SOLR configurations

www.teknopoint.us  

Indexing

Indexing is using HTTP POST. So indexed can be posted to SOLR via a web request Data can be pulled using Data Import Handler (uses HTTP GET or DB) SOLR can index binary content (textual + metadata) from docs, video, mp3, images and other binary content

www.teknopoint.us  

Search

Search features: Paging, Filtering, Sorting, Faceting

Results: xml (Default), json, php, ruby, python etc. Query Parser: used to interpret queries. 2 types of query parsers

Lucene Query Syntax Parser DisMax Parser (Disjunction Max)

www.teknopoint.us  

Solr integration approaches

Crawl using an external crawler like Nutch or Heritrix CQ servlets to serialize content into a Solr (JSON/XML) JCR Observer for page modifications to trigger indexing to Solr.

www.teknopoint.us  

AEM 6

2 Types In Built Remote (For distributed) Zookeeper (for setting up a cluster)

Shard – horizontal Partition Replication – no of copies of the index files

www.teknopoint.us  

SOLR things we didn’t see

https://github.com/evolvingweb/ajax-solr http://wiki.apache.org/solr/SolrQuerySyntax

www.teknopoint.us  

Thanks

@yash_mody http://www.linkedin.com/in/modyyash

top related