ruby day kraków: full text search with ferret
TRANSCRIPT
Ruby Day Kraków: Full Text Searchwith Ferret
Agnieszka Figiel
25th November 2006
Ruby Day Kraków: Full Text Search with Ferret
Agenda
I full text search implementation optionsI tools for rubyI ferret and acts as ferretI searching with ferretI overview of index optionsI multi searchI more like it
Ruby Day Kraków: Full Text Search with Ferret
Full Text Search
A search of a document collection, which examines all of the wordsin every stored document as it tries to match search words suppliedby the user.
I indexI tokenize all documentsI filter out stop wordsI apply stemmingI apply a term weighting scheme
I searchI use the index to find all documents matching a query
Ruby Day Kraków: Full Text Search with Ferret
Database Full Text Index
I MySQLI PostgreSQLI MS SQLI OracleI DB2
Ruby Day Kraków: Full Text Search with Ferret
Search Systems
I Google, YahooI Swish-e (C, Perl API available)I Lucene (Java, ports for C, C++, .NET, Delphi, Perl, Python,
PHP, Common Lisp, ruby)I Nutch (Lucene + crawler)I Lucene-WS (Lucene via REST)I SOLR (Lucene via XML/HTTP and JSON)
Ruby Day Kraków: Full Text Search with Ferret
Ruby Search Systems
I Hyper EstraierI Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ferret
http://rubyforge.org/projects/ferret
a text search engine library written for Ruby. It is inspired byApache Lucene Java project.
Ruby Day Kraków: Full Text Search with Ferret
acts as ferret
http://projects.jkraemer.net/acts_as_ferret/wiki
a plugin for Ruby on Rails which builds on Ferret
I search across the contents of any Rails model classI each model has its own index on diskI search multiple modelsI support for Rails Single Table InheritanceI index attributes or virtual attributes of a modelI indexing can be customized by overriding the to doc methodI find similar items (’more like this’)
Ruby Day Kraków: Full Text Search with Ferret
Installation
ferret gem:
gem install ferret
acts as ferret:
script/plugin installsvn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret
Ruby Day Kraków: Full Text Search with Ferret
Example
YASB (Yet Another Searchable Blog)class Post < ActiveRecord::Base
has_many :commentsend
class Comment < ActiveRecord::Basebelongs_to :post
end
Ruby Day Kraków: Full Text Search with Ferret
Basic post search
Let’s add a basic search on the Post model:class Post < ActiveRecord::Base
has_many :commentsacts_as_ferret
end
Search posts:Post.find_by_contents(search_term)
After running the first search an index will be created for the Postmodel.ALL fields are indexed if no additional options are given, includingarrays of child objects (STI).
Ruby Day Kraków: Full Text Search with Ferret
Limit indexed fields
To limit the fields that are indexed for a given model we canspecify their list:acts_as_ferret :fields => [ ’title’, ’body’ ]
NOTE: after any change to index settings, the index needs to berebuilt.Post.rebuild_index
Ruby Day Kraków: Full Text Search with Ferret
Index options
There are numerous options of customising ferret’s indexing.
Example:acts_as_ferret( :fields => {:title => { :boost => 2 },:body => { :boost => 1}
}, :store_class_name => true)
This will add a boost (importance) factor of 2 to the title field,and 1 to the body field. The class name will be stored for multipleclass searches.
Ruby Day Kraków: Full Text Search with Ferret
Index options: store
Value Description:no Don’t store field:yes Store field in its original format.
Use this value if you want to highlightmatches or print match excerpts a la Googlesearch.
:compressed Store field in compressed format.
Ruby Day Kraków: Full Text Search with Ferret
Index options: index
Value Description:no Do not make this field searchable.:yes Make this field searchable and tok-
enize its contents.:untokenized Make this field searchable but do not
tokenize its contents. Use this valuefor fields you wish to sort by.
:omit norms Same as :yes except omit the normsfile. The norms file can be omit-ted if you don’t boost any fields andyou don’t need scoring based on fieldlength.
:untokenized omit norms Same as :untokenized except omit thenorms file.
Ruby Day Kraków: Full Text Search with Ferret
Index options: term vector
Value Description:no Don’t store term-vectors:yes Store term-vectors without storing positions
or offsets.:with positions Store term-vectors with positions.:with offsets Store term-vectors with offsets.:with positions ofssets Store term-vectors with positions and off-
sets.
Ruby Day Kraków: Full Text Search with Ferret
Index options: boost
Value DescriptionFloat The boost property is used to set the default
boost for a field. This boost value will usedfor all instances of the field in the index un-less otherwise specified when you create thefield. All values should be positive.
Ruby Day Kraków: Full Text Search with Ferret
Search the comments
Searching a model and its related models can be achieved withvirtual attributes.
A getter of all comment messages defined in Post class:def post_comments
comments.collect{|c| c.message}.join(’ ’)end
Add like a normal field to ferret’s field list:acts_as_ferret :fields => [ ’title’, ’body’, ’post_comments’ ]
Ruby Day Kraków: Full Text Search with Ferret
Search in multiple models
In case we would like to search for both comments and posts(multi search) we need to:
I create index for both modelsI for each of them set the store class name flag
After rebuilding indices for Post and Comment we can run a multisearch on both:Post.multi_search(params[:search],[Comment])
Ruby Day Kraków: Full Text Search with Ferret
More like this
We would like a feature of finding the most similar posts to achosen one.That’s pretty simple:post.more_like_this({:field_names=>[’title’,’body’,’post_comments’],:min_term_freq => 2, :min_doc_freq => 3})
The options passed here tell the search engine 2 things:I take into consideration only terms that appear more than once
in the source documentI take into consideration only terms that appear in minimum 3
documents
Ruby Day Kraków: Full Text Search with Ferret
Links
Products:I Swish-e http://swish-e.org/index.htmlI Lucene http://lucene.apache.org/java/docs/index.htmlI Nutch http://lucene.apache.org/nutch/I Lucene-WS http://lucene-ws.sourceforge.net/I SOLR http://incubator.apache.org/solr/I Hyper Estraier http://hyperestraier.sourceforge.net/I Ferret http://rubyforge.org/projects/ferretI acts as ferret http://projects.jkraemer.net/acts as ferret/
Reading:I tutorial by Roman Mackovcak: http://blog.zmok.net/articles/2006/10/18/full-
text-search-in-ruby-on-rails-3-ferretI tutorial by Seth Fitzsimmons: http://mojodna.net/searchable/ruby/railsconf.pdfI aaf and Unicode by Albert Ramstedt:
http://albert.delamednoll.se/articles/2005/12/20/the-ferret-plugin-with-simple-unicode-support
Ruby Day Kraków: Full Text Search with Ferret
Thank you!
Good luck using ferret!
Ruby Day Kraków: Full Text Search with Ferret