apache solr liferay

20
Apache Solr Enterprise search platform from the Apache Lucene project Rivet Logic Corporation 1800 Alexander Bell Drive Suite 400 Reston, VA 20191 Ph: 703.955.3480 Fax: 703.234.7711

Upload: binesh-gummadi

Post on 26-Jan-2015

143 views

Category:

Technology


10 download

DESCRIPTION

A 2009 presentation which I just found in archives

TRANSCRIPT

Page 1: Apache solr liferay

Apache SolrEnterprise search platform

from the Apache Lucene project

Rivet Logic Corporation1800 Alexander Bell DriveSuite 400Reston, VA 20191Ph: 703.955.3480 Fax: 703.234.7711

Page 2: Apache solr liferay

What is Solr?

● Search Server● Built upon Apache Lucene ● Fast, very● Scalable, query load and collection size● Interoperable● Extensible● Lucene power exposed over HTTP● Spell checking, highlighting, faceting and etc.● Caching● Replication● Distributed search

Page 3: Apache solr liferay

How stuff works?

Page 4: Apache solr liferay

schema.xml

● Field types○ <fieldType name="text" class="solr.TextField" indexed="true" />

● Fields○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>

● Unique key (optional) ○ <uniqueKey>id</uniqueKey>

● copy fields○ <copyField source="developers" dest="df"/>

● dynamic fields○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>

● similarity configuration○ Similarity is the scoring routine for each document vs. a query

Page 5: Apache solr liferay

solrconfig.xml

● Lucene indexing parameters○ <mergeFactor>10</mergeFactor>○ <ramBufferSizeMB>32</ramBufferSizeMB>

● Cache settings○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="

32"/>

● Request handler configuration○ <requestHandler name="dismax" class="solr.SearchHandler" >

● HTTP cache settings○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">

● Search components, response writers, query parsers○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

○ <queryResponseWriter name="velocity" class="org.apache.solr.request.VelocityResponseWriter"/>

○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>

Page 6: Apache solr liferay

Request Handler

<requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str>

<str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>

Page 7: Apache solr liferay

Response Writer

● A Response Writer generates the formatted response of a search.

● The wt parameter selects the Response Writer to be used

● json, php, phps, python, ruby, xml, xslt, velocity

<queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>

Page 8: Apache solr liferay

Analyzers, Tokenizers, Filters

● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text

<fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/></fieldType>

● The job of a tokenizer is to break up a stream of text into tokens

<fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer></fieldType>

● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it

Page 9: Apache solr liferay

Other features

● Highlighting○ &hl=true&hl.fl=developers

● Synonyms○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"

expand="true"/>

● Spell check○ The spell check component can return a list of alternative spelling

suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

● Content Streams○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in

solrconfig.xml ● Solr Cell

○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many

other types ● More like this

○ http://wiki.apache.org/solr/MoreLikeThis

Page 10: Apache solr liferay

Indexing with solrJ

SolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr"));SolrInputDocument doc = new SolrInputDocument();doc.addField("id", "EXAMPLEDOC01");doc.addField("title", "NOVAJUG SolrJ Example");solr.add(doc);solr.commit(); // after a batch, not per documentsolr.optimize(); // periodically, if/when needed

Page 11: Apache solr liferay

Data Import Handler

● Indexes relational database, XML data, and e-mail sources

● Supports full and incremental/delta indexing● Highly extensible with custom data sources,

transformers, etc● http://wiki.apache.org/solr/DataImportHandler

Page 12: Apache solr liferay

Replication

● Master is polled● Replicant pulls Lucene index and optionally also Solr

configuration files● Query throughput scaling: replicate and load balance● http://wiki.apache.org/solr/SolrReplication

Page 13: Apache solr liferay

Demo

● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/

● Start solr○ cd <solr_home>/example○ java -jar start.jar

● Post documents○ cd <solr_home>/example/exampledocs○ java -jar post.jar *.xml○ java -jar post.jar cw.xml

● Access Solr○ http://localhost:8983/solr/admin/

● Querying solr○ http://localhost:8983/solr/select/?q=binesh○ http://localhost:8983/solr/select/?q=binny○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1○ http://localhost:8983/solr/itas/

● Luke○ http://www.getopt.org/luke/

Page 14: Apache solr liferay

Liferay + Solr: Motivation

● Centralizing search index in clustered Liferay environment

● Performance improvement○ Re-indexing costs too much for large DB's○ Often time indexes of Liferay deployments in a cluster are not

synchronized

Page 15: Apache solr liferay

Liferay + Solr: Configuration 1

Install Solr (http://lucene.apache.org/solr)

Setting up environment variables● SOLR_HOME = /${solr installed folder}● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"

solr.xml● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content

<?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>

Page 16: Apache solr liferay

Liferay + Solr: Configuration 2

schema.xml● This file tells Solr how to index the data coming from Liferay, and can be

customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have

to create the conf directory) in your Solr home folder.... <fields><field name="comments" type="text" indexed="true" stored="true" /><field name="content" type="text" indexed="true" stored="true" /><field name="description" type="text" indexed="true" stored="true" /><field name="name" type="text" indexed="true" stored="true" /><field name="properties" type="text" indexed="true" stored="true" /><field name="title" type="text" indexed="true" stored="true" /><field name="uid" type="string" indexed="true" stored="true" /><field name="url" type="text" indexed="true" stored="true" /><field name="userName" type="text" indexed="true" stored="true" /><field name="version" type="text" indexed="true" stored="true" /><dynamicField name="*" type="string" indexed="true" stored="true" /></fields><uniqueKey>uid</uniqueKey><defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...

Page 17: Apache solr liferay

Liferay + Solr: Configuration 3

Copy WAR file● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war

into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0.

Start Liferay/tomcat● Solr will be picked up and "solr" will be deployed automatically under

${tomcat}/webapps folder

Install solr-web Liferay plugin● Latest Liferay plugin can be checked out from the following location

http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web● Build the checked out plugin and deploy it

Page 18: Apache solr liferay

Liferay + Solr: Configuration 4

Final Step● We need to rebuild Liferay search indexes● Control Panel > Server Administration

Page 19: Apache solr liferay

Liferay + Solr: How it works

... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"><property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"><property name="commit" value="true" /><property name="solrServer" ref="solrServer" /> </bean> ...

solr-spring.xml (from solr-web plugin)

Page 20: Apache solr liferay

Liferay + Solr: Back to the default?

● Simply undeploy solr-web plugin● Rebuild search indexes using the control panel described

in the previous step