apache solr + ajax solr

36
+ ajax-solr

Upload: net7

Post on 18-Jun-2015

673 views

Category:

Technology


7 download

DESCRIPTION

Apache Solr and ajax-solr overview

TRANSCRIPT

Page 1: Apache Solr + ajax solr

+ ajax-solr

Page 2: Apache Solr + ajax solr

Solr (pronounced "solar") is an open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. [...] Solr is the most popular enterprise search engine. Solr 4 adds NoSQL features.

What is Solr (1)

Page 3: Apache Solr + ajax solr

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages.

(source: wikipedia)

What is Solr (2)

Page 4: Apache Solr + ajax solr

Installing Solr (speedrun)

# tar zxvf solr-4.10.0.tgz

# mv solr-4.10.0 /opt/

Page 5: Apache Solr + ajax solr

Solr comes already configured, for fine tuning the Solr fields, modify the file

/opt/solr-4.10.0/example/solr/collection1/conf/schema.xml

Configuring Solr

Page 6: Apache Solr + ajax solr

Running Solr - JettySolr includes a configured jetty installation, to run it:

# cd /opt/solr-4.10.0/example/ # java -jar start.jar

Page 7: Apache Solr + ajax solr

Create a Tomcat context file:

/var/lib/tomcat6/conf/Catalina/localhost/mysolr.xml

(better yet, create it somewhere else and link it there)

Running Solr - Tomcat (1)

Page 8: Apache Solr + ajax solr

Context file content:<?xml version="1.0" encoding="utf-8"?><Context docBase="/opt/solr-4.10.0/example/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/solr-4.10.0/example/solr" override="true"/></Context>

Running Solr - Tomcat (2)

Page 9: Apache Solr + ajax solr

Tomcat: find the app “mysolr” in the manager webapp

Jetty: http://localhost:8983/solr/

Solr Admin interface

Page 10: Apache Solr + ajax solr

Untar the solr-4.10.0.tgz files multiple times, and rename the directories.

# tar zxvf solr-4.10.0.tgz # cp -a solr-4.10.0 /opt/mysolr1 # cp -a solr-4.10.0 /opt/mysolr2

Multiple Solr instances (1)

Page 11: Apache Solr + ajax solr

Create multiple context files with different names, each of them must point to a different solr installation.

/var/lib/tomcat6/conf/Catalina/localhost/solrApp1.xml/var/lib/tomcat6/conf/Catalina/localhost/solrApp2.xml

Multiple Solr instances (2)

Page 12: Apache Solr + ajax solr

Change the context files content, to point to the right solr installation. File solrApp1.xml:<?xml version="1.0" encoding="utf-8"?><Context docBase="/opt/mysolr1/example/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/mysolr1/example/solr" override="true"/></Context>

Multiple Solr instances (3)

Page 13: Apache Solr + ajax solr

Change the context files content, to point to the right solr installation. File solrApp2.xml:<?xml version="1.0" encoding="utf-8"?><Context docBase="/opt/mysolr2/example/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/mysolr2/example/solr" override="true"/></Context>

Multiple Solr instances (4)

Page 14: Apache Solr + ajax solr

The Solr schema must have an unique field, identified in the schema.xml file by something like: <uniqueKey>id</uniqueKey>

where id is the field defined by: <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

Solr Schema (1)

Page 15: Apache Solr + ajax solr

For most of the other fields of the indexed resources, you will use the Solr dynamic fields defined in the scheme like the following ones:

<dynamicField name="*_i" type="int" indexed="true" stored="true"/> <dynamicField name="*_is" type="int" indexed="true" stored="true" multiValued="true"/>

Solr Schema (2)

Page 16: Apache Solr + ajax solr

Solr field types are defined in the schema too:

<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>

Solr Schema (3)

Page 17: Apache Solr + ajax solr

Text types can have some magic stuff:<!-- A text field that only splits on whitespace for exact matching of words --><fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer></fieldType>

Solr Schema (4)

Page 18: Apache Solr + ajax solr

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter [...]/> <!-- +other filters --> </analyzer> <analyzer type="query"> <tokenizer [...]/> <filter [...]/> <!-- +other filters --> </analyzer></fieldType>

Solr Schema (5)

Page 19: Apache Solr + ajax solr

Analyzers are components that pre-process input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.(from https://wiki.apache.org/solr/)

Solr Schema (6)

Page 20: Apache Solr + ajax solr

Tokenizer examples:● <tokenizer

class="solr.WhitespaceTokenizerFactory"/>● <tokenizer

class="solr.StandardTokenizerFactory"/>● <tokenizer

class="solr.LetterTokenizerFactory"/>● <tokenizer

class="solr.LowerCaseTokenizerFactory"/>

Solr Schema (7)

Page 21: Apache Solr + ajax solr

Filter examples:● <filter class="solr.LowerCaseFilterFactory"/>● <filter

class="solr.RemoveDuplicatesTokenFilterFactory"/>

● <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

● <filter class="solr.GermanNormalizationFilterFactory"/>

● <filter class="solr.GermanLightStemFilterFactory"/>

Solr Schema (8)

Page 22: Apache Solr + ajax solr

Although the schema is already set up, you should really go through it and get rid of all the unnecessary stuff!

Get rid of all the fields and types you don’t need, maintainability will be enhanced, and everything will look much clearer!

Solr Schema (9)

Page 23: Apache Solr + ajax solr

Solr can be indexed in many ways, depending on your application technology.

● In PHP: you can use Solarium, a Solr client library

● In Java: you can use Solrj● Using Data Import Request

Handler

Indexing Solr (1)

Page 24: Apache Solr + ajax solr

Synchronous indexing:

PHP or Java you listen to DataBase insertion, deletion and update, and call the related Solr APIs (by using the libraries, possibly)

Indexing Solr (2)

Page 25: Apache Solr + ajax solr

Synchronous indexing, pitfalls:

What if Solr is down? - find a way to be sure the sync is done!

Indexing Solr (3)

Page 26: Apache Solr + ajax solr

AJAX Solr loosely follows the Model-view-controller pattern. The ParameterStore is the model, storing the Solr parameters and, thus, the state of the application. The Manager is the controller; it talks to the ParameterStore, sends requests to Solr, and delegates the response to the widgets for rendering. The widgets are the views, each rendering a part of the interface.

Ajax-Solr (1)

Page 27: Apache Solr + ajax solr

AJAX Solr it’s a javascript application.It offers an autocomplete feature searching in multiple fields (reported in the result list), faceted search based tagcloud, result display

Ajax-Solr (2)

Page 28: Apache Solr + ajax solr

Download zip from https://github.com/evolvingweb/ajax-solr/

unzip:# unzip ajax-solr-master.zip

Ajax-Solr - Deployment (1)

Page 29: Apache Solr + ajax solr

Use /examples/reuters-requirejs/ as a starting point for your application.In /examples/reuters-requirejs/js/reuters.js set Solr address

solrUrl: 'http://localhost:8983/solr/',

Ajax-Solr - Deployment (2)

Page 30: Apache Solr + ajax solr

In the same reuters.js set the fields to be used.Just before:Manager.addWidget(new AjaxSolr.TagcloudWidgetSet the fields var with the fields you want to use in the tag clouds, e.g.

var fields = [‘notebook_label_s’,’user_s’];

Ajax-Solr - Deployment (3)

Page 31: Apache Solr + ajax solr

Now we set parameters for the facet (used in the tag clouds): var params = { facet: true, 'facet.field': [ 'notebook_label_s', 'user_s'], 'facet.limit': 20, 'facet.mincount': 1, 'f.notebook_label_s.facet.limit': 50, 'f.user_s.facet.limit': 50,

Ajax-Solr - Deployment (4)

Page 32: Apache Solr + ajax solr

We may add other parameters to the facet query here

'fq': 'basket_id_s:' +basket_id

or

'fq': 'my_field_i:8'

Ajax-Solr - Deployment (5)

Page 33: Apache Solr + ajax solr

In the Manager.addWidget(new AjaxSolr.AutocompleteWidget({

set the fields to be used for the autocomplete :var fields = [‘notebook_label_s’,’predicate_label_s’];

Ajax-Solr - Deployment (6)

Page 34: Apache Solr + ajax solr

In the index.html example page everything is already set up. You’ll just need to customize the tagclouds, by adding/modifying the tagcloud tags, like

<h2>Notebooks</h2><div class="tagcloud" id="notebook_label_s"> </div> <h2>Users</h2><div class="tagcloud" id="user_s"> </div>

Ajax-Solr - Deployment (7)

Page 35: Apache Solr + ajax solr

One last thing you may want to do, is customize the results output, in the /examples/reuters-requirejs/widgets/ResultWidget.js file, modify the content of:

template: function (doc) {[...]}

Ajax-Solr - Deployment (8)

Page 36: Apache Solr + ajax solr

Solr: http://lucene.apache.org/solr/Ajax-Solr: https://github.com/evolvingweb/ajax-solr/Solarium: http://www.solarium-project.org/Solr wiki: http://wiki.apache.org/solr/

Resources