ruby day kraków: full text search with ferret

22
Ruby Day Kraków: Full Text Search with Ferret Agnieszka Figiel 25th November 2006 Ruby Day Kraków: Full Text Search with Ferret

Upload: elliando-dias

Post on 10-May-2015

2.311 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Ruby Day Kraków: Full Text Search with Ferret

Ruby Day Kraków: Full Text Searchwith Ferret

Agnieszka Figiel

25th November 2006

Ruby Day Kraków: Full Text Search with Ferret

Page 2: Ruby Day Kraków: Full Text Search with Ferret

Agenda

I full text search implementation optionsI tools for rubyI ferret and acts as ferretI searching with ferretI overview of index optionsI multi searchI more like it

Ruby Day Kraków: Full Text Search with Ferret

Page 3: Ruby Day Kraków: Full Text Search with Ferret

Full Text Search

A search of a document collection, which examines all of the wordsin every stored document as it tries to match search words suppliedby the user.

I indexI tokenize all documentsI filter out stop wordsI apply stemmingI apply a term weighting scheme

I searchI use the index to find all documents matching a query

Ruby Day Kraków: Full Text Search with Ferret

Page 4: Ruby Day Kraków: Full Text Search with Ferret

Database Full Text Index

I MySQLI PostgreSQLI MS SQLI OracleI DB2

Ruby Day Kraków: Full Text Search with Ferret

Page 5: Ruby Day Kraków: Full Text Search with Ferret

Search Systems

I Google, YahooI Swish-e (C, Perl API available)I Lucene (Java, ports for C, C++, .NET, Delphi, Perl, Python,

PHP, Common Lisp, ruby)I Nutch (Lucene + crawler)I Lucene-WS (Lucene via REST)I SOLR (Lucene via XML/HTTP and JSON)

Ruby Day Kraków: Full Text Search with Ferret

Page 6: Ruby Day Kraków: Full Text Search with Ferret

Ruby Search Systems

I Hyper EstraierI Ferret

Ruby Day Kraków: Full Text Search with Ferret

Page 7: Ruby Day Kraków: Full Text Search with Ferret

Ferret

http://rubyforge.org/projects/ferret

a text search engine library written for Ruby. It is inspired byApache Lucene Java project.

Ruby Day Kraków: Full Text Search with Ferret

Page 8: Ruby Day Kraków: Full Text Search with Ferret

acts as ferret

http://projects.jkraemer.net/acts_as_ferret/wiki

a plugin for Ruby on Rails which builds on Ferret

I search across the contents of any Rails model classI each model has its own index on diskI search multiple modelsI support for Rails Single Table InheritanceI index attributes or virtual attributes of a modelI indexing can be customized by overriding the to doc methodI find similar items (’more like this’)

Ruby Day Kraków: Full Text Search with Ferret

Page 9: Ruby Day Kraków: Full Text Search with Ferret

Installation

ferret gem:

gem install ferret

acts as ferret:

script/plugin installsvn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

Ruby Day Kraków: Full Text Search with Ferret

Page 10: Ruby Day Kraków: Full Text Search with Ferret

Example

YASB (Yet Another Searchable Blog)class Post < ActiveRecord::Base

has_many :commentsend

class Comment < ActiveRecord::Basebelongs_to :post

end

Ruby Day Kraków: Full Text Search with Ferret

Page 11: Ruby Day Kraków: Full Text Search with Ferret

Basic post search

Let’s add a basic search on the Post model:class Post < ActiveRecord::Base

has_many :commentsacts_as_ferret

end

Search posts:Post.find_by_contents(search_term)

After running the first search an index will be created for the Postmodel.ALL fields are indexed if no additional options are given, includingarrays of child objects (STI).

Ruby Day Kraków: Full Text Search with Ferret

Page 12: Ruby Day Kraków: Full Text Search with Ferret

Limit indexed fields

To limit the fields that are indexed for a given model we canspecify their list:acts_as_ferret :fields => [ ’title’, ’body’ ]

NOTE: after any change to index settings, the index needs to berebuilt.Post.rebuild_index

Ruby Day Kraków: Full Text Search with Ferret

Page 13: Ruby Day Kraków: Full Text Search with Ferret

Index options

There are numerous options of customising ferret’s indexing.

Example:acts_as_ferret( :fields => {:title => { :boost => 2 },:body => { :boost => 1}

}, :store_class_name => true)

This will add a boost (importance) factor of 2 to the title field,and 1 to the body field. The class name will be stored for multipleclass searches.

Ruby Day Kraków: Full Text Search with Ferret

Page 14: Ruby Day Kraków: Full Text Search with Ferret

Index options: store

Value Description:no Don’t store field:yes Store field in its original format.

Use this value if you want to highlightmatches or print match excerpts a la Googlesearch.

:compressed Store field in compressed format.

Ruby Day Kraków: Full Text Search with Ferret

Page 15: Ruby Day Kraków: Full Text Search with Ferret

Index options: index

Value Description:no Do not make this field searchable.:yes Make this field searchable and tok-

enize its contents.:untokenized Make this field searchable but do not

tokenize its contents. Use this valuefor fields you wish to sort by.

:omit norms Same as :yes except omit the normsfile. The norms file can be omit-ted if you don’t boost any fields andyou don’t need scoring based on fieldlength.

:untokenized omit norms Same as :untokenized except omit thenorms file.

Ruby Day Kraków: Full Text Search with Ferret

Page 16: Ruby Day Kraków: Full Text Search with Ferret

Index options: term vector

Value Description:no Don’t store term-vectors:yes Store term-vectors without storing positions

or offsets.:with positions Store term-vectors with positions.:with offsets Store term-vectors with offsets.:with positions ofssets Store term-vectors with positions and off-

sets.

Ruby Day Kraków: Full Text Search with Ferret

Page 17: Ruby Day Kraków: Full Text Search with Ferret

Index options: boost

Value DescriptionFloat The boost property is used to set the default

boost for a field. This boost value will usedfor all instances of the field in the index un-less otherwise specified when you create thefield. All values should be positive.

Ruby Day Kraków: Full Text Search with Ferret

Page 18: Ruby Day Kraków: Full Text Search with Ferret

Search the comments

Searching a model and its related models can be achieved withvirtual attributes.

A getter of all comment messages defined in Post class:def post_comments

comments.collect{|c| c.message}.join(’ ’)end

Add like a normal field to ferret’s field list:acts_as_ferret :fields => [ ’title’, ’body’, ’post_comments’ ]

Ruby Day Kraków: Full Text Search with Ferret

Page 19: Ruby Day Kraków: Full Text Search with Ferret

Search in multiple models

In case we would like to search for both comments and posts(multi search) we need to:

I create index for both modelsI for each of them set the store class name flag

After rebuilding indices for Post and Comment we can run a multisearch on both:Post.multi_search(params[:search],[Comment])

Ruby Day Kraków: Full Text Search with Ferret

Page 20: Ruby Day Kraków: Full Text Search with Ferret

More like this

We would like a feature of finding the most similar posts to achosen one.That’s pretty simple:post.more_like_this({:field_names=>[’title’,’body’,’post_comments’],:min_term_freq => 2, :min_doc_freq => 3})

The options passed here tell the search engine 2 things:I take into consideration only terms that appear more than once

in the source documentI take into consideration only terms that appear in minimum 3

documents

Ruby Day Kraków: Full Text Search with Ferret

Page 21: Ruby Day Kraków: Full Text Search with Ferret

Links

Products:I Swish-e http://swish-e.org/index.htmlI Lucene http://lucene.apache.org/java/docs/index.htmlI Nutch http://lucene.apache.org/nutch/I Lucene-WS http://lucene-ws.sourceforge.net/I SOLR http://incubator.apache.org/solr/I Hyper Estraier http://hyperestraier.sourceforge.net/I Ferret http://rubyforge.org/projects/ferretI acts as ferret http://projects.jkraemer.net/acts as ferret/

Reading:I tutorial by Roman Mackovcak: http://blog.zmok.net/articles/2006/10/18/full-

text-search-in-ruby-on-rails-3-ferretI tutorial by Seth Fitzsimmons: http://mojodna.net/searchable/ruby/railsconf.pdfI aaf and Unicode by Albert Ramstedt:

http://albert.delamednoll.se/articles/2005/12/20/the-ferret-plugin-with-simple-unicode-support

Ruby Day Kraków: Full Text Search with Ferret

Page 22: Ruby Day Kraków: Full Text Search with Ferret

Thank you!

Good luck using ferret!

Ruby Day Kraków: Full Text Search with Ferret