small wins in a small time with apache solr

51
Small wins In a small time with Apache Solr

Upload: sourcesense

Post on 08-May-2015

2.226 views

Category:

Technology


0 download

DESCRIPTION

Slides used in a 2-hour long hands-on tutorial on Apache Solr at Dev8D UK: http://wiki.2011.dev8d.org/w/Session-WK16"This is an introductory tutorial on Apache Solr, an open source enterprise search engine with a restful web interface."

TRANSCRIPT

Page 1: Small wins in a small time with Apache Solr

Small wins In a small time with Apache Solr

Page 2: Small wins in a small time with Apache Solr

Who am I?

My (Buddhist) name is Upayavira

Consultant with Sourcesense, specialising in search and operational technologies

A member of the Apache Software Foundation

Page 3: Small wins in a small time with Apache Solr

Who are Sourcesense?

Open Source integrator, specialising in:

Search

Business Intelligence

Content Management

Application Lifecycle Management

Offices in London, Amsterdam, Milan and Rome

Page 4: Small wins in a small time with Apache Solr

Committers and Contributors Search:

Lucene/Solr – contributor

Hibernate Search – committer

Lucene Infinispan integration – lead developer

Apache UIMA – committer

CMS:

Apache Chemistry – contributor

Apache Jackrabbit – contributor

JBoss GateIn Portal – committer

OpenSSO-Alfresco - contributor

Page 5: Small wins in a small time with Apache Solr

What is Lucene?

Lucene is a Java information retrieval library Provides free text search facilities Started in 2000, by Doug Cutting A project of the Apache Software Foundation It is designed to be embedded in Java apps

Page 6: Small wins in a small time with Apache Solr

What is Solr?

Solr is an enterprise search server based on Lucene

Wraps Lucene with a RESTful web interface Provides configurable schema Provides replication functionality

Page 7: Small wins in a small time with Apache Solr

Solr Design

Solrinstance

UpdateRequestHandler

SearchHandler

User queries

Luceneindex

contentapplication

Page 8: Small wins in a small time with Apache Solr

Prerequisites

Java, preferably Java 6 Apache Solr 1.4.1 http://www.sourcesense.com/dev8d-solr.zip

Page 9: Small wins in a small time with Apache Solr

Prerequisites Extract your Solr distribution At a command prompt:

–cd into the unzipped distribution directory

–cd into the example directory

–Enter: java -jar start.jar Visit http://localhost:8983/solr/ in a browser. If you see a

welcome message, your Solr works Unpack your dev8d-solr.zip file At another command prompt, cd into your dev8d-solr

directory

Page 10: Small wins in a small time with Apache Solr

Checking Solr Works

Visit http://localhost:8983/solr/admin/ You should see the Solr admin page. Click statistics link You'll see NumDocs: 0 There's nothing in the index, so searches won't show

much So we need to index some sample content

Page 11: Small wins in a small time with Apache Solr

Indexing Sample Content

In your dev8d-solr directory (extracted from the zip), at a command prompt:

Java -jar post.jar wikipedia-basic.xml

Page 12: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=*:*

Page 13: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computers

Page 14: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computer systems

Page 15: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computers OR systems

Page 16: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computers AND systems

Page 17: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q="computer systems"

Page 18: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q="computer systems"~10

Page 19: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computers NOT data

Page 20: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computers -data

Page 21: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&fl=title

Page 22: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&fq=author:yobot

Page 23: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author

Page 24: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title

Page 25: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=title:system&fl=title

Page 26: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc

Page 27: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author

Page 28: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex

Page 29: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count

Page 30: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincount=2

Page 31: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3

Page 32: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3&debugQuery=true

Page 33: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computer&wt=json

Page 34: Small wins in a small time with Apache Solr

Searching

http://localhost:8983/solr/select?q=computer&wt=javabin

Page 35: Small wins in a small time with Apache Solr

Indexing

Page 36: Small wins in a small time with Apache Solr

Indexing

Load wikipedia-basic.xml into a text editor or web browser

Load wikipedia-enhanced.xml into a text editor or browser

Load example/solr/conf/schema.xml into a text editor

Page 37: Small wins in a small time with Apache Solr

Indexing

schema.xml defines field types and fields used in Solr

Equivalent to your database schema in a RDBMS

Page 38: Small wins in a small time with Apache Solr

Indexing

Change these two fields in schema.xml to be of type “string” and add multiValued=”true” for each.

<field name="links" type="string" indexed="true" stored="true" multiValued="true"/>

<field name="category" type="string" indexed="true" stored="true" multiValued="true"/>

Page 39: Small wins in a small time with Apache Solr

Indexing

Now add this to the <fields> section of solrconfig.xml:

<field name="source" type="string" indexed="true" stored="true" multiValued="false"/>

<field name="textgen" type="textgen" indexed="true" stored="true" multiValued="true"/>

Now search for the “textgen” field type definition, further up in the file.

Page 40: Small wins in a small time with Apache Solr

Indexing

At the bottom of solrconfig.xml add the following:

<copyField source="text" dest="textgen"/>

Page 41: Small wins in a small time with Apache Solr

Indexing

At your command prompt, in the dev8d directory, execute:

java -jar post.jar wikipedia-enhanced.xml

Page 42: Small wins in a small time with Apache Solr

More Advanced Searching

http://localhost:8983/solr/select?q=computers%20AND%20babbage&facet=true&facet.field=category&facet.mincount=1

Page 43: Small wins in a small time with Apache Solr

More Advanced Searching

http://localhost:8983/solr/terms?terms.fl=text&terms=true&terms.limit=20

Page 44: Small wins in a small time with Apache Solr

More Advanced Searching

http://localhost:8983/solr/terms?terms.fl=textgen&terms=true&terms.limit=20

Page 45: Small wins in a small time with Apache Solr

More Advanced Searching

http://localhost:8983/solr/terms?terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at

Page 46: Small wins in a small time with Apache Solr

thank [email protected]

Page 47: Small wins in a small time with Apache Solr

Solr Host Configuration

shard 1

shard 2

shard 3

searches

Page 48: Small wins in a small time with Apache Solr

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

Page 49: Small wins in a small time with Apache Solr

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

load balancer

Page 50: Small wins in a small time with Apache Solr

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

load balancer

shard 1

shard 2

shard 3

co-ordinator

Page 51: Small wins in a small time with Apache Solr

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

load balancer

shard 1

shard 2

shard 3

co-ordinator