things made easy: one click cms integration with solr & drupal
DESCRIPTION
Presented by Peter Wolanin | Acquia, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 If you have a new web project or and existing Drupal site, the combination of Drupal and Apache Solr is both powerful and easy to set up thanks to the existing integration code. The module allows for substantial customization with the administrative UI. Drupal facilitates further customizations of the UI, indexing, and bosting because of the open architecture that provides multiple opportunities for custom code to alter the behavior. A couple code snippets will be followed by a review of other contributed Drupal modules that further enhance the search capability. Finally, this session will showcase some example of Drupal sites using Solr including Acquia's own sites and Drupal sites including many well-known Enterprise and government sites.TRANSCRIPT
Things Made Easy: One Click CMS Integration with Solr & Drupal
Peter M. Wolanin, Ph.D.Momentum Specialist (principal engineer), Acquia, Inc.
Drupal contributor drupal.org/user/49851co-maintainer of the Drupal Apache Solr Search Integration module
May 10, 2012
• What is Drupal?• What Apache Solr features are integrated with
Drupal?• Why is Drupal plus Apache Solr is better than
starting from scratch?• What elements of the search can you
configure in the UI without code?
Key Questions to Be Answered
• You are starting a new website project?• You are wondering how hard it is to actually
integrate Apache Solr with a website?• You already use Drupal but not with Apache
Solr?• You like things that are easy yet powerful?
Why Are You Here?
Drupal: Web Application Framework + CMS == Social Publishing Platform
blogs /wikis
forums / comments
socialranking
social tagging
users
social networks
workflow
taxonomy
semantic web
RSS
content
analytics
ContentMgmt
Systems
SocialSoftware
Tools
Drupal “… is as much a Social Software platform as it is a web content management system.”
CMS Watch, The Web CMS Report 2009
Drupal + Solr Provides Immediate Access to Rich Search Features
Dynamic content requires dynamic navigation - which is provided by an effective searchSearch facets mean no dead endsSolr provides better keyword relevancy in resultsMuch faster searches for sites with lots of contentBy avoiding database queries, Drupal with Solr scales better
DEMO: A Drupal 7 partial copy of the conference
site with Apache Solr integration
http://youtu.be/yY6kma_ViWc
Drupal Has User Accounts, Roles & Permissions
Define custom roles Set granular access controls by roleConfigure user behavior:
– Registration– Email– Profiles– Pictures
Drupal Modules Add Functionality
“There’s a module for that”More than 4100 Drupal 7 community modulesOften controlled by role-based permissionsDrupal core and modules are GPL v2+, and have a huge, active community
Drupal is Written in PHP, Which Makes for Easy Customization
The Drupal architecture encourages and provides many avenues for customization by writing modules but not patching Drupal coreDrupal has a huge community of users. Approximately 10,000 sites report to Drupal.org that they use the Apache Solr Search Integration module.
Drupal Adapts toYou!!
Drupal Entities are Content + Data
Node 7 Node 9Node 8
Node 4 Node 6Node 5
Node 1 Node 3Node 2
Nodes are the basic entity used for text contentThe entity system is extensible - can represent any dataExamples of data stored within Drupal entities
– Text– geographic location– Node reference
Define new data fields on a node using the Field API module.
– Text, images, integers, date, reference, etc
Flexible and configurable in the UINo programming required (many existing modules)
Entity Types are Enriched With User-configurable Data Fields
A Strong Framework for Content Classification
Core taxonomy systemModules provide taxonomy-based appearance, access controlStandard input options include free tagging, flat-controlled, and hierarchical-controlled
Drupal + Solr Search for Business, Government and NGOs
http://www.mattel.com/search/apachesolr_search/
http://www.hrw.org/en/search/apachesolr_search/http://www.restorethegulf.gov/search/apachesolr_search/
http://www.nypl.org/search/apachesolr_search/
http://www.mylifetime.com/community/search/apachesolr_search/
http://opensource.com/search/apachesolr_search/
https://www.ethicshare.org/publications/
http://www.poly.edu/search/apachesolr_search/
https://www.eff.org/search/site/
http://www.whitehouse.gov/search/site/
http://www.emporia.edu/search/site/
Drupal Has Already Solved Many Solr Integration Challenges
The most important - content indexing.Facets, sorting, and highlighting of results.Immediate integration with the More Like This and spell-check handlers.Included sub-module integrates content access permissions by indexing to and filtering Solr results based on the current user.
Easy Content Recommendation!Uses the MLT handlerPicks fields from the currently viewed node
The Module Has a Pipeline for Indexing Drupal Content to Solr
Drupal entities are processed into one (or more) document objects. Each document object is converted to XML and sent to Solr.
titlenidtype
Node object Document object
Drupalfunctions
entity_typelabel
entity_idbundle
XML string
<doc> <field name="entity_type">node</field> <field name="label">Hello Drupal</field> <field name="entity_id">101</field> <field name="bundle">session</field></doc>
Entity Meta-data Gives Automatic Facets!
Content typesTaxonomy terms per vocabularyContent authorsPosted and modified datesText and numbers selected via select list/radios/check boxes
Drupal Modules Implement hooks to Control Indexing and DisplayHOOK_apachesolr_index_document_build($document, $entity, $entity_type, $env_id)
By creating a Drupal module (in PHP), you can implement module and theme “hooks” to extend or alter Drupal behavior. Change or replace the data normally indexed.Modify the search results and their appearance.
Updates to an Entity or Related Meta-data Cause Reindexing
Drupal entities are indexed during Drupal cron (typically invoked via *nix cron).By using a specialized tracking table, content can automatically be queued for reindex when changed, and subsets of content can potentially be sent to different Solr indexes.Entities include many ID-based reference fields (e.g. the User ID of the author). Changes to the referenced data is also watched.
Indexing Tracking Tables Maintain Order+-------------+-----------+-------------+--------+------------+| entity_type | entity_id | bundle | status | changed |+-------------+-----------+-------------+--------+------------+| node | 36 | session | 1 | 1336520756 || node | 37 | session | 1 | 1336510489 || node | 38 | session | 1 | 1336510456 || node | 39 | session | 1 | 1336510456 || node | 40 | speaker_bio | 1 | 1336510456 |+-------------+-----------+-------------+--------+------------+
When a node is updated, the “changed” timestamp is updated.The indexing pipeline tracks the largest timestamp and entity_id which has been indexed.
Example: Taxonomy Term Classifying a Node is Changed
Grapefruit Citrus fruit
All nodes classified with this terms are queued to be re-indexed by setting the “changed” column to the current time. Thus you will correctly match ‘Citrus’ instead of ‘Grapefruit’ for those documents.
function apachesolr_taxonomy_term_update($term)
When Unpublished, Content is Purged
Drupal core includes a simple editorial workflow where content may be toggled between published (visible) and unpublished (incomplete, removed, spam, etc).The module immediately removes content from the index when unpublished, and also tracks it for future removal in case the Solr server is unavailable.
Search Using Dismax Query Parsing & Boosting Features
Dynamic fields in schema.xml used to index standard and custom entity data fieldsDismax (or EDismax) handler used for keyword searching across multiple fields and per-field boostsQuery-time boosting options available in the UI
A Query Object Is Used to Prepare and Run Searches
$query->setParam('hl.fl', $field);$keys = $query->getParam('q');$response = $query->search();
HOOK_apachesolr_query_prepare($query)
More Modules Available to Add More Features
ApacheSolr AttachmentsApache Solr Multisite SearchApache Solr Organic Groups IntegrationApachesolr User indexingApachesolr Commerce
A few examples:
To Wrap Up !
Drupal has extensive Apache Solr integration already, and is highly customizable.The Drupal platform is widely adopted, and the Drupal community drives rapid innovation.Acquia provides Enterprise Drupal support and a network of partners.Acquia includes a secure, hosted Solr index with every support subscription.
• What is Drupal?• What Apache Solr features are integrated with
Drupal?• Why is Drupal plus Apache Solr is better than
starting from scratch?• What elements of the search can you
configure in the UI without code?
Did I Answer These?
• http://www.solarium-project.org/• http://php.net/solr
http://pecl.php.net/package/solr• http://code.google.com/p/solr-php-client/
Other PHP Integration Tools
Caveat: don’t use serialized PHP response format in a custom integration - use JSON writer.
• Do you love Drupal, Solr, the LAMP stack, DevOps or anything related, and working at a fast-growing and successful startup?
• Boston and Portland area U.S. offices.• Some remote opportunities as well.• Come talk to me!
[email protected] in IRC #drupal or #solr
Acquia is Hiring!
Resources ... Questions? !
http://drupal.org/project/apachesolrhttp://drupal.org/project/apachesolr_attachmentshttp://archive.org/details/drupalconchi_day2_attain_apache_solr_coding_chopshttp://www.acquia.com/tags/apachesolrhttp://groups.drupal.org/lucene-nutch-and-solr