advanced search with solr & django-haystack

Post on 18-Dec-2014

425 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Search and information discovery is a huge part of almost any modern site. Solr is an incredibly powerful search tool that allows us to quickly add advanced search capabilities such as full-text search, faceting, autocomplete and spelling suggestions to our projects without much effort. We will be using 'django-haystack' to communicate between Django and Solr.

TRANSCRIPT

ADVANCED SEARCH WITH

SOLR + DJANGO-HAYSTACK

MARCEL CHASTAINLA DJANGO – 2014-09-30

WHAT WE’LL COVER

1. THE PITCH:

The Problem With Search

The Solution(s)

Overall Architecture of System with Django/Solr/Haystack

2. THE GOOD STUFF:

Indexing Data for Search

Querying the Search Index

Advanced Search Methods

Resources

THE PITCH

OR, “WHY ANY OF THIS MATTERS”

THE PROBLEM

1. Sites with stored information are ONLY as useful as they are at retrieving and displaying that information

THE PROBLEM

2. Users have high expectations of search (thanks, Google)

THE PROBLEM

2. Users have high expectations of search

• Spelling Suggestions:

THE PROBLEM

2. Users have high expectations of search

• Hit Highlighting:

THE PROBLEM

2. Users have high expectations of search

• “Related Searches”• Distance/GeoSpatial Search

THE PROBLEM

2. Users have high expectations of search• Faceting:

THE PROBLEM

3. Good search involves lots of challenges

THE PROBLEM

3. Good search involves lots of challenges

• Stemming:

“argue”“argues”“argued”

“argu”

“argument”“arguments”

“argument”

User Searches For Word “Stem”

THE PROBLEM

3. Good search involves lots of challenges

And more..!

• Synonyms• Acronyms• Non-ASCII characters• Stop words (“and”, “to”, “a”)• Calculating relevance• Performance with millions/billions(!) of documents

THE SOLUTION

“Information Retrieval Systems”a.k.a Search Engines

THE SOLUTION

“Information Retrieval Systems”a.k.a Search Engines

SOLR

THE BACKEND

WHAT IS SOLR?Open-source enterprise search

Java-based

Created in 2004

Built on Apache Lucene

Most popular enterprise search engine

Apache 2.0 License

Built for millions or billions of documents

WHAT DOES IT DO?• Full-text search

• Hit highlighting

• Faceted search

• Clustering/replication/sharding

• Database integration

• Rich document (word, pdf, etc) handling

• Geospatial search

• Spelling corrections/suggestions

• … loads and loads more

WHO USES SOLR?

HOW CAN WE USE IT WITH DJANGO?

Haystack

From the homepage:

(http://haystacksearch.org/)

LOOK FAMILIAR?

Query style

Declarative search index definitions

THE GOOD STUFFINSTALLING, CONFIGURING & USING SOLR/HAYSTACK

WHO DOES WHATSolr:

• Provides API for submitting to & querying from index

• Stores actual index data

• Manages fields/data types in xml config (‘schema.xml’)

Haystack:• Manages connection(s) to solr• Provides familiar API for querying • Uses templates and declarative search index definitions• Helps generate solr xml config• Management commands to index content• Generic views/forms for common search use-cases• Hooks into signals to keep data up-to-date

PART 1:LET’S MAKE AN INDEX

0. GITHUB REPO

git clone https://github.com/marcelchastain/haystackdemo

1. SETUP SOLR(from github repo root)

./solr_download.sh

(or, manually)

wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz

tar –xzvf solr-4.10.1.tgz

ln –s ./solr-4.10.1 ./solr

The one file to care about:• solr/example/solr/collection1/conf/schema.xml

Stores field definitions and data types. Frequently updated during development

2. RUN SOLR

(from github repo root)

./solr_start.sh

(or, manually)

cd solr/example && java –jar start.jar

Requires java 1.7+. To install on debian/ubuntu:sudo apt-get install openjdk-7-jre-headless

3. INSTALL HAYSTACK

(CWD haystackdemo/)

apt-get install python-pip python-virtualenv

virtualenv env && source env/bin/activate

(from github repo root)

pip install –r requirements.txt

(or, manually)

pip install Django==1.6.7 django-haystack

4. HAYSTACK SETTINGSINSTALLED_APPS = [

# ‘django.contrib.admin’, etc

‘haystack’,

# then your usual apps

‘myapp’,

]

HAYSTACK_CONNECTIONS = {

‘default’: {

‘ENGINE’: ‘haystack.backends.solr_backend.SolrEngine’,

‘URL’: ‘http://127.0.0.1:8983/solr’

},

}

HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor’

5. THE MODEL(S)

6. SYNCDB & INITIAL DATA

(CWD haystackdemo/demo/)

./manage.py syncdb

./manage.py loaddata restaurants

7. DEFINE SEARCH INDEXmyapp/search_indexes.py

7.5 BOOSTING FIELD RELEVANCE

Some fields are simply more relevant!(Note: changes to field boosts require reindex)

8. CREATE A TEMPLATE FOR INDEXED TEXT

templates/search/indexes/myapp/note_text.txt

9. UPDATE SOLR SCHEMA

(CWD: haystackdemo/demo/)

./manage.py build_solr_schema >

../solr/example/solr/collection1/conf/schema.xml

Which adds:

*Restart solr for changes to go into effect

10. REBUILD INDEX

(CWD hackstackdemo/demo/)

$ ./manage.py update_index

Indexing 6 notes

10. REBUILD INDEX

(CWD hackstackdemo/demo/)

$ ./manage.py update_index

Indexing 6 notes

PART 2:LET’S GET TO QUERYIN’

SIMPLE SEARCHQUERYSETS

GREAT, WHAT ABOUT FROM A BROWSER?

EASY MODE

urls.py

templates/search/search.html

Full-document search

HAYSTACK COMPONENTS TO EXTEND

• haystack.forms.SearchFormdjango form with extendable .search() method. Define additional fields on the form, then incorporate them in the .search() method’s logic

• haystack.views.SearchViewClass-based view made to be flexible for common search cases

PART 3: FEATURES

HIT HIGHLIGHTING

Instead of referring to a context variable directly, use the {% highlight %} tag

SPELLING SUGGESTIONSUpdate connection’s settings dictionary + reindex

Use spelling_suggestion() method

AUTOCOMPLETECreate another search index field using EdgeNgramField + reindex

Use the .autocomplete() method on a SearchQuerySet

FACETINGAdd faceting to search index definition

Regenerate schema.xml and reindex content

./manage.py build_solr_schema >

../solr/example/solr/collection1/conf/schema.xml

./manage.py update_index

FACETINGFrom a shell:

RESOURCES

LET’S SAVE YOU A GOOGLE TRIP

RESOURCES

Solr in Action ($45)Apr 2014

Haystack Documentationhttp://django-haystack.readthedocs.org/

IRC (freenode):#django#haystack#solr

top related