small wins in a small time with apache solr
DESCRIPTION
Slides used in a 2-hour long hands-on tutorial on Apache Solr at Dev8D UK: http://wiki.2011.dev8d.org/w/Session-WK16"This is an introductory tutorial on Apache Solr, an open source enterprise search engine with a restful web interface."TRANSCRIPT
Small wins In a small time with Apache Solr
Who am I?
My (Buddhist) name is Upayavira
Consultant with Sourcesense, specialising in search and operational technologies
A member of the Apache Software Foundation
Who are Sourcesense?
Open Source integrator, specialising in:
Search
Business Intelligence
Content Management
Application Lifecycle Management
Offices in London, Amsterdam, Milan and Rome
Committers and Contributors Search:
Lucene/Solr – contributor
Hibernate Search – committer
Lucene Infinispan integration – lead developer
Apache UIMA – committer
CMS:
Apache Chemistry – contributor
Apache Jackrabbit – contributor
JBoss GateIn Portal – committer
OpenSSO-Alfresco - contributor
What is Lucene?
Lucene is a Java information retrieval library Provides free text search facilities Started in 2000, by Doug Cutting A project of the Apache Software Foundation It is designed to be embedded in Java apps
What is Solr?
Solr is an enterprise search server based on Lucene
Wraps Lucene with a RESTful web interface Provides configurable schema Provides replication functionality
Solr Design
Solrinstance
UpdateRequestHandler
SearchHandler
User queries
Luceneindex
contentapplication
Prerequisites
Java, preferably Java 6 Apache Solr 1.4.1 http://www.sourcesense.com/dev8d-solr.zip
Prerequisites Extract your Solr distribution At a command prompt:
–cd into the unzipped distribution directory
–cd into the example directory
–Enter: java -jar start.jar Visit http://localhost:8983/solr/ in a browser. If you see a
welcome message, your Solr works Unpack your dev8d-solr.zip file At another command prompt, cd into your dev8d-solr
directory
Checking Solr Works
Visit http://localhost:8983/solr/admin/ You should see the Solr admin page. Click statistics link You'll see NumDocs: 0 There's nothing in the index, so searches won't show
much So we need to index some sample content
Indexing Sample Content
In your dev8d-solr directory (extracted from the zip), at a command prompt:
Java -jar post.jar wikipedia-basic.xml
Searching
http://localhost:8983/solr/select?q=*:*
Searching
http://localhost:8983/solr/select?q=computers
Searching
http://localhost:8983/solr/select?q=computer systems
Searching
http://localhost:8983/solr/select?q=computers OR systems
Searching
http://localhost:8983/solr/select?q=computers AND systems
Searching
http://localhost:8983/solr/select?q="computer systems"
Searching
http://localhost:8983/solr/select?q="computer systems"~10
Searching
http://localhost:8983/solr/select?q=computers NOT data
Searching
http://localhost:8983/solr/select?q=computers -data
Searching
http://localhost:8983/solr/select/?q=computers&fl=title
Searching
http://localhost:8983/solr/select/?q=computers&fq=author:yobot
Searching
http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author
Searching
http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title
Searching
http://localhost:8983/solr/select/?q=title:system&fl=title
Searching
http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc
Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author
Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex
Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count
Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincount=2
Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3
Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3&debugQuery=true
Searching
http://localhost:8983/solr/select?q=computer&wt=json
Searching
http://localhost:8983/solr/select?q=computer&wt=javabin
Indexing
Indexing
Load wikipedia-basic.xml into a text editor or web browser
Load wikipedia-enhanced.xml into a text editor or browser
Load example/solr/conf/schema.xml into a text editor
Indexing
schema.xml defines field types and fields used in Solr
Equivalent to your database schema in a RDBMS
Indexing
Change these two fields in schema.xml to be of type “string” and add multiValued=”true” for each.
<field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
Indexing
Now add this to the <fields> section of solrconfig.xml:
<field name="source" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="textgen" type="textgen" indexed="true" stored="true" multiValued="true"/>
Now search for the “textgen” field type definition, further up in the file.
Indexing
At the bottom of solrconfig.xml add the following:
<copyField source="text" dest="textgen"/>
Indexing
At your command prompt, in the dev8d directory, execute:
java -jar post.jar wikipedia-enhanced.xml
More Advanced Searching
http://localhost:8983/solr/select?q=computers%20AND%20babbage&facet=true&facet.field=category&facet.mincount=1
More Advanced Searching
http://localhost:8983/solr/terms?terms.fl=text&terms=true&terms.limit=20
More Advanced Searching
http://localhost:8983/solr/terms?terms.fl=textgen&terms=true&terms.limit=20
More Advanced Searching
http://localhost:8983/solr/terms?terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
thank [email protected]
Solr Host Configuration
shard 1
shard 2
shard 3
searches
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
load balancer
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
load balancer
shard 1
shard 2
shard 3
co-ordinator
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
load balancer
shard 1
shard 2
shard 3
co-ordinator