advanced search with lucene
TRANSCRIPT
Introductions
PresentersChris Pliakas – Engineer
Erich Beyrent – VP of Engineering
http://www.commonplaces.com
Presentation Summary
•The problem with Search
What is Lucene?
The Search Lucene API module
Advanced usage
Implementing the API
The state of SLAPI and where it is going
Problem - Common Search Requests
Advanced query Syntax
High-performance, scalable
Ability to add custom facets
Multisite search, content not shared
Managed through Drupal admin interface
Analysis of the Core Search
Pros: Good API ... for the most part
Pure PHP solution Works out of the box
Cons:Elementary query syntaxNot scalableNo good method to alter queryAdding facets is unwieldy
What is Lucene?
An open source text search library written in
High-performance and full featured
Supported by the Apache Software Foundation
is ...
Capabilities of Lucene
•Ranked search results
•Boolean AND, AND NOT, OR
•Fielded data search
•Powerful query types•Wildcard, fuzzy, range, boost
•Field and term grouping
•Index on filesystem, no SQL
Goals of Search Lucene API
•Integrate Lucene into Drupal
•API for Lucene backend, define hooks
•Implement and extend core Search API
•Easy to install, no external services
•Native PHP solution
Drupal ninjas use hooks, andwe don't want to upset ninjas.
Where's the PHP?
What is the Zend Framework?
Well documented, tested, E_STRICT compliant
ZF's Zend_Search_Lucene component
Object oriented PHP port of Lucene
Lucene index binary compatible with Java
Stripped down version of required components
Expertly Decaffeinated by the
Installation
•Download Search Lucene API from Drupal.
•http://drupal.org/project/luceneapi
Download ZF components from SourceForge.net
Enable the Search Lucene API modules
•Search, Search Lucene API, Search Lucene Content
Run !!
Your site search now rocks.
Configuring Search Lucene API
Hijacking the core search box
Error handling settings
Search Lucene Content settings
Configuring facets
No kittens were harmed in the makingof the D6 version of Search Lucene API
Performance Testing
Search Lucene API vs. Search vs. Apache Solr
Memory consumption
Page load time
Index maintenance operations
Comparison With Other Engines
Improving Lucene Performance
Search results caching
Result set limit
Index optimization
Performance Settings
Maintaining Lucene With Drush
Who needs cron?
Performing common maintenance tasks
Retrieving index information
Updating “gotcha”
The future of Drush integration
Objects passed by reference
Exceptional error handling with Exceptions
Autoload implementation
Abstraction layer for common ZF objects
Before we start developing ...
PHP 5 Language Constructs
Faceted Search
•“A faceted classification system allows the assignment
• of multiple classifications to an object, enabling the
•classifications to be ordered in multiple ways, rather
•than in a single, pre-determined, taxonomic order.”
•~Wikipedia
“Wikipedia is the best thing ever. Anyone inthe world can write anything they want aboutany subject. So you know you are getting thebest possible information” ~Michael Scott
Creating Facets
Why the Facet API makes sense
hook_luceneapi_facet($op, $module, $type)
Handling facets via “facet handler” callback
How to $_GET facet values
Defining multiple facets in one hook.
Advanced facets on Twolia
Creating a Search Lucene API Facet Module
How the Facet API Works
Converting $_POST to $_GET
Facet hook invoked in luceneapi_form_alter()
Callbacks invoked in luceneapi_search('search')
Facet queries appended as required subqueries
Very similar to the core Search
Extending Search Lucene Content
Index Hooks
•hook_luceneapi_document_alter($doc, $module, $type)
•hook_luceneapi_document_delete($item, $module, $type)
“Useful for adding extra fields forfaceted searched ad filtering whichdata can be deleted from the index”
Extending Search Lucene Content
Search Hooks
•hook_luceneapi_query_alter($query, $module, $type)
•hook_luceneapi_result_alter(&$result, $module, $type)
•hook_luceneapi_positive_keys($keys, $module, $type)
“Useful for modifying the final search query and the informationdisplayed in the results”
Creating a Search Lucene API Module
•Core search hooks:•hook_search(), hook_update_index()
•Search Lucene API hooks:•hook_luceneapi_index($op)
“Search Lucene API is an extensionof the core Search API”
Search Lucene API 2.0
Process control extension
Forking the search processes
Index opened only once on startup
Drupal module becomes the application
Addressing scalability
Search Lucene API 2.0
User, help, multisite search
Result sorting
User defined weights and boost factors
Better index statistics
Improved caching mechanism
New Features
Recap
Replace core search with Search Lucene API
Install, configure, and tune SLAPI modules
Maintain indexes via Drush
Use and extend Seach Lucene API
In Summary ...