"searching with solr" - tyler harms, south dakota code camp 2012

Post on 07-May-2015

583 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

"Searching with Solr" by Tyler Harms, given November 10, 2012, at South Dakota Code Camp 2012 in Sioux Falls.

TRANSCRIPT

Tyler HarmsDeveloper

@harmstyler

tyler@blendinteractive.com

AN INTRODUCTION

Searching with Solr

1

Saturday, November 10, 12

SEARCHING WITH SOLR

Why Implement Solr?

• Does your site need search?• Is google enough?• Do you need/want to control rankings?• Just text, or Structured Data?

2

Saturday, November 10, 12

SEARCHING WITH SOLR

What is Solr?

3

Solr is a standalone enterprise search server with a REST-like API. You put documents in it [...] over HTTP. You query it via HTTP GET and receive [...] results.

Saturday, November 10, 12

4

Saturday, November 10, 12

SEARCHING WITH SOLR

• Current Version(s)• Solr 3.6.1• Solr 4

• Released Versions are always stable

5

Solr Versions

Saturday, November 10, 12

6

$ wget http://(...)/3.6.1/apache-solr-3.6.1.tgz

$ tar -xzf apache-solr-3.6.1.tgz

$ cd apache-solr-3.6.1/example/

$ java -jar start.jar

(a lot of java log...)

Saturday, November 10, 12

SEARCHING WITH SOLR

• Google• Lucene• elasticsearch• Whoosh• Xapien• Many Others

7

Search Alternatives

Saturday, November 10, 12

SEARCHING WITH SOLR

NOT a Database Replacement

• Solr is designed to live alongside your website as a separate web app

8

Saturday, November 10, 12

9

Frontend Servers[1..n]Database Master

Database Slaves[0..n]

Solr Master

Solr Slaves[0..n]

10

Saturday, November 10, 12

SEARCHING WITH SOLR

Scaling Solr

• Master/Slave Architecture• Write to master -> Read from slaves

• Multicore Setup• Multiple Solr ‘cores’ running alongside each other within the same install

10

Saturday, November 10, 12

SUB HEADLINE

Solr’s Data Model

• Solr maintains a collection of documents• A document is a collection of fields and values• A field can occur multiple times in a doc• Documents are immutable• They can be deleted and replaced by new versions, however.

11

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Querying

• http request• http://localhost:8983/solr/select?q=blend&start=0&rows=10

12

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Solr Query Syntax

• blend (value)• company:blend (field:value)• title:”Searching with Solr” AND text:apache• id:[* TO *]• *:* (all fields : all values)

13

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Using Solr

• Getting Data into Solr• Getting Data out of Solr

14

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Getting Data into Solr

• POST it

15

SEARCHING WITH SOLR

<add> <doc> <field name="abstract">Lorem ipsum</field> <field name="company">Blend Interactive</field> <field name="text">Lorem Ipsum</field> <field name="title">Some Title</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]]</add>

Saturday, November 10, 12

SUB HEADLINE

Getting Data into Solr

• POST it

16

SEARCHING WITH SOLR

<add> <doc> <field name="abstract">Lorem ipsum</field> <field name="company">Blend Interactive</field> <field name="text">Lorem Ipsum</field> <field name="title">Some Title</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]]</add>

Saturday, November 10, 12

SUB HEADLINE

Getting Data into Solr

• POST it

17

SEARCHING WITH SOLR

<add> <doc> <field name="abstract">Lorem ipsum</field> <field name="company">Blend Interactive</field> <field name="text">Lorem Ipsum</field> <field name="title">Some Title</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]]</add>

Saturday, November 10, 12

SUB HEADLINE

Commiting

• Nothing shows up in the index until you commit• You can just POST <commit/> to:• http://<host>:<port>/solr/update

18

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Getting Data out of Solr

• http://localhost:8983/solr/select/?q=solr

19

SEARCHING WITH SOLR

Saturday, November 10, 12

20

<response><lst name="responseHeader">

<int name="status">0</int><int name="QTime">19</int><lst name="params">

<str name="q">solr</str></lst>

</lst><result name="response" numFound="1" start="0">

<doc><str name="abstract">A brief introduction to using Apache Solr for implementing search for your website.</str><str name="django_ct">codecamp.session</str><str name="django_id">19</str><str name="id">codecamp.session.19</str><str name="text">Searching with Solr: An Introduction A brief introduction to using Apache Solr for implementing search for your website.</str><str name="title">Searching with Solr: An Introduction</str>

</doc></result>

</response>

Saturday, November 10, 12

21

<response><lst name="responseHeader">

<int name="status">0</int><int name="QTime">19</int><lst name="params">

<str name="q">solr</str></lst>

</lst><result name="response" numFound="1" start="0">

<doc><str name="abstract">A brief introduction to using Apache Solr for implementing search for your website.</str><str name="django_ct">codecamp.session</str><str name="django_id">19</str><str name="id">codecamp.session.19</str><str name="text">Searching with Solr: An Introduction A brief introduction to using Apache Solr for implementing search for your website.</str><str name="title">Searching with Solr: An Introduction</str>

</doc></result>

</response>

Saturday, November 10, 12

22

<response><lst name="responseHeader">

<int name="status">0</int><int name="QTime">19</int><lst name="params">

<str name="q">solr</str></lst>

</lst><result name="response" numFound="1" start="0">

<doc><str name="abstract">A brief introduction to using Apache Solr for implementing search for your website.</str><str name="django_ct">codecamp.session</str><str name="django_id">19</str><str name="id">codecamp.session.19</str><str name="text">Searching with Solr: An Introduction A brief introduction to using Apache Solr for implementing search for your website.</str><str name="title">Searching with Solr: An Introduction</str>

</doc></result>

</response>

Saturday, November 10, 12

SUB HEADLINE

Getting Data out of Solr: JSON

• http://localhost:8983/solr/select/?q=solr&wt=json

23

SEARCHING WITH SOLR

Saturday, November 10, 12

24

{"responseHeader": {

"status":0,"QTime":0,"params": {

"wt":"json","q":"solr"

}},"response": {

"numFound":1,"start":0,"docs":[{

"django_id":"19","title":"Searching with Solr: An Introduction","text":"Searching with Solr: An Introduction\nA brief introduction to using Apache Solr for implementing search for your website.","abstract":"A brief introduction to using Apache Solr for implementing search for your website.","django_ct":"codecamp.session","id":"codecamp.session.19"

}]}

}

Saturday, November 10, 12

SUB HEADLINE

Deleting Data from Solr

• POST it

25

SEARCHING WITH SOLR

<delete><id>codecamp.session.19</id></delete><delete><query>company:blend</query></delete>

Saturday, November 10, 12

SEARCHING WITH SOLR

The Solr Schema

• schema.xml• Defines ‘types’ used in the webapp• Defines the fields• Defines ‘copyfields’• Read the schema inside the example project for more

26

Saturday, November 10, 12

SEARCHING WITH SOLR

The Solr Schema

• Types• Define how a field and query should be processed• Word Stemming• Case Folding• How would you handle a search for ‘C.I.A.’?

• Dates, ints, floats, etc.. are defined here as well• 2 Modes• Index Time• Query Time

27

Saturday, November 10, 12

28

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"><analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer><analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer></fieldType>

Saturday, November 10, 12

29

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"><analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer><analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer></fieldType>

Saturday, November 10, 12

30

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"><analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer><analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

</analyzer></fieldType>

Saturday, November 10, 12

SEARCHING WITH SOLR

Fields

• The elements of a document• Both Predefined and Dynamic• Fields may occur multiple times• May be indexed and/or stored

31

Saturday, November 10, 12

32

<fields><!-- general --><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" /><field name="django_id" type="string" indexed="true" stored="true" multiValued="false" /><!-- dynamic --><dynamicField name="*_i" type="sint" indexed="true" stored="true"/><dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_l" type="slong" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/><dynamicField name="*_b" type="boolean" indexed="true" stored="true"/><dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/><dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/><dynamicField name="*_dt" type="date" indexed="true" stored="true"/><!-- app --><field name="bio" type="text" indexed="true" stored="true" multiValued="false" /><field name="title" type="text" indexed="true" stored="true" multiValued="false" /><field name="text" type="text" indexed="true" stored="true" multiValued="false" /><field name="abstract" type="text" indexed="true" stored="true" multiValued="false" /><field name="full_name" type="text" indexed="true" stored="true" multiValued="false" /><field name="company" type="text" indexed="true" stored="true" multiValued="false" />

</fields>

Saturday, November 10, 12

33

<fields><!-- general --><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" /><field name="django_id" type="string" indexed="true" stored="true" multiValued="false" /><!-- dynamic --><dynamicField name="*_i" type="sint" indexed="true" stored="true"/><dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_l" type="slong" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/><dynamicField name="*_b" type="boolean" indexed="true" stored="true"/><dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/><dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/><dynamicField name="*_dt" type="date" indexed="true" stored="true"/><!-- app --><field name="bio" type="text" indexed="true" stored="true" multiValued="false" /><field name="title" type="text" indexed="true" stored="true" multiValued="false" /><field name="text" type="text" indexed="true" stored="true" multiValued="false" /><field name="abstract" type="text" indexed="true" stored="true" multiValued="false" /><field name="full_name" type="text" indexed="true" stored="true" multiValued="false" /><field name="company" type="text" indexed="true" stored="true" multiValued="false" />

</fields>

Saturday, November 10, 12

34

<fields><!-- general --><field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/><field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" /><field name="django_id" type="string" indexed="true" stored="true" multiValued="false" /><!-- dynamic --><dynamicField name="*_i" type="sint" indexed="true" stored="true"/><dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_l" type="slong" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/><dynamicField name="*_b" type="boolean" indexed="true" stored="true"/><dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/><dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/><dynamicField name="*_dt" type="date" indexed="true" stored="true"/><!-- app --><field name="bio" type="text" indexed="true" stored="true" multiValued="false" /><field name="title" type="text" indexed="true" stored="true" multiValued="false" /><field name="text" type="text" indexed="true" stored="true" multiValued="false" /><field name="abstract" type="text" indexed="true" stored="true" multiValued="false" /><field name="full_name" type="text" indexed="true" stored="true" multiValued="false" /><field name="company" type="text" indexed="true" stored="true" multiValued="false" />

</fields>

Saturday, November 10, 12

SEARCHING WITH SOLR

Copy Fields

• Two Main Uses• Analyze fields in different ways• Concatenate Fields

35

Saturday, November 10, 12

36

<copyField source="bio" dest="df_text" /><copyField source="year" dest="century" maxChars="2"/>

Saturday, November 10, 12

37

<copyField source="bio" dest="df_text" /><copyField source="year" dest="century" maxChars="2"/>

Saturday, November 10, 12

38

<copyField source="bio" dest="df_text" /><copyField source="year" dest="century" maxChars="2"/>

2000 would be stored as 20Useful for custom faceting

Saturday, November 10, 12

SUB HEADLINE

The Solr Config File

• solrconfig.xml• Defines request handlers, defaults, & caches• Read the solrconfig.xml inside the example project for more

39

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Other Solr Tools

• Debug Query• Boost Functions• Search Faceting• Search Filters• Search Highlighting• Solr Admin

40

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Debug Query Option

• Add &debugQuery=on to request parameters• Returns a parsed form of the query

41

SEARCHING WITH SOLR

Saturday, November 10, 12

42

<lst name="debug"><str name="rawquerystring">solr</str><str name="querystring">solr</str><str name="parsedquery">text:solr</str><str name="parsedquery_toString">text:solr</str><lst name="explain">

<str name="codecamp.session.19">1.2147729 = (MATCH) fieldWeight(text:solr in 17), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.9267395 = idf(docFreq=2, maxDocs=56) 0.21875 = fieldNorm(field=text, doc=17)</str>

</lst>

Saturday, November 10, 12

43

<lst name="debug"><str name="rawquerystring">solr</str><str name="querystring">solr</str><str name="parsedquery">text:solr</str><str name="parsedquery_toString">text:solr</str><lst name="explain">

<str name="codecamp.session.19">1.2147729 = (MATCH) fieldWeight(text:solr in 17), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.9267395 = idf(docFreq=2, maxDocs=56) 0.21875 = fieldNorm(field=text, doc=17)</str>

</lst>

Saturday, November 10, 12

SUB HEADLINE

Boost Function

• Allows you to influence results at query time• Really useful for tuning scoring• You can also boost at index time

44

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Boost Function

• Allows you to influence results at query time• Really useful for tuning scoring• You can also boost at index time

45

SEARCHING WITH SOLR

q=blend&qf=text^2 company

Saturday, November 10, 12

SUB HEADLINE

Solr Faceting

• What is a facet?• “Interaction style where users filter a set of items by

progressively selecting from only valid values of a  faceted classification system” - Keith Instone, SOASIS&T, July 8, 2004

• What does it look like?• Make sure to use an untokenized field (e.g. string)• “San Jose” != “san”+“jose”

48

SEARCHING WITH SOLR

Saturday, November 10, 12

49

q=*:*facet=onfacet.field=company

Saturday, November 10, 12

SUB HEADLINE

Solr Filter Query

• Used to narrow your search query• Restrict the super set of documents that can be returned

• ‘fq’ parameter (short for Filter Query)

50

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Solr Filter Query

• Used to narrow your search query• Restrict the super set of documents that can be returned

• ‘fq’ parameter (short for Filter Query)

51

SEARCHING WITH SOLR

q=*:*fq=company:blend

Saturday, November 10, 12

SUB HEADLINE

Search Highlighting

• Allow Solr to generate your highlight

52

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Search Highlighting

• Allow Solr to generate your highlight

53

SEARCHING WITH SOLR

Saturday, November 10, 12

54

hl=truehl.simple.pre=<b>hl.simple.post=</b>hl.fragsize=200hl.requireFieldMatch=falsehl.fl=text bio titlehl.snippets=1

Saturday, November 10, 12

SUB HEADLINE

Solr Admin

• http://localhost:8983/solr/admin/• Built in app for testing all search options• Field Analysis• Schema Browser• Full Query Interface• Solr Statistics• Solr Information• Many More Options

55

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Solr/Browse

• Test your search configuration using the /browse requestHandler

56

SEARCHING WITH SOLR

Saturday, November 10, 12

SUB HEADLINE

Resources

• Apache Solr Website• http://lucene.apache.org/solr/• Wiki, mailing list, bugs/features

• Books

57

SEARCHING WITH SOLR

Saturday, November 10, 12

58

Saturday, November 10, 12

top related