word up! using lucene for full-text search of your data set
TRANSCRIPT
Word Up!Using Lucene for full-text search of your data set
Full-text searchReview of full-text search options
Focus on Lucene
Integrating Lucene with JPA/Hibernate
Full-text search options‘LIKE’ queries
SQL extensions
Kludge with web search engine
Kludge with web search appliance
Embeddable search library
‘LIKE’ queries
‘LIKE’ queriesSimple, straightforward
Fast, easy to implement
Large result set
Limited fuzziness (wildcard or regex)
Full-text search extensionsNo standard syntax (Sybase, MSSQL, DB2, etc. all different)
Administrative overhead for text search indices
Other limitations
Kludge with search engineExternal indexing/search software
ht://Dig
mnoGoSearch
Sphinx
Xapian
Not necessarily pure Java
Can be database-intensive
Lag in updating search index
Kludge with search appliance“Black-box” solutions
Thunderstone
Google Search Appliance
Your data set mixes with public content
Doesn’t always work as advertised
Can’t fine-tune search
Embeddable search library
Search libraryExample: Apache Lucene
Deploys as part of your application
100% Java
Fuzzy full-text search (Levenshtein algorithm)
Searches against text, numeric, boolean fields with multiple options
Can be integrated with JPA/Hibernate via Hibernate Search, Compass
About LuceneSearch index stored on file system (also JDBC and BDB options)
Can store/retrieve data to/from search index (Lucene Projections)
Can index HTML, XML, Office docs, PDFs, Exchange mail with external tools
Supports extended and multi-byte character sets by default
More about LuceneIndexes records as Lucene Document object
Lucene Document doesn’t have to be a literal document – can be any arbitrary object
Document can have any number of name-value pairs
Synchronizing your data with search index is someone else’s problem …
Integrating with JPA / HibernateMost common method: Hibernate Search
Supports only Hibernate provider
Automatically updates search index when object persisted to database
Entity classes mapped to separate indexes
Entity fields mapped to Lucene index fields using Java annotations
Integrating with JPA/Hibernate …Alternate method: Compass Project
Supports Hibernate, OpenJPA, others
No release since 2009 – effectively unsupported
Annotated class example …@Indexed
@Entity
@Cacheable(true)
@Table(name="MARKER", schema="MAPLINK")
public class Marker extends MarkerA implements Serializable {
@Id
@Column(name="MKR_MARKERID")
@Field(store=Store.YES)
private long mkrMarkerid;
@Column(name="MKR_LAT", nullable = true)
@Field(store=Store.YES)
@NumericField
private Double mkrLat;
@Column(name="MKR_LONG", nullable = true)
@Field(store=Store.YES)
@NumericField
private Double mkrLong;
@Indexed – tells Hibernate that this entity class should be
indexed
Annotated class example …@Indexed
@Entity
@Cacheable(true)
@Table(name="MARKER", schema="MAPLINK")
public class Marker extends MarkerA implements Serializable {
@Id
@Column(name="MKR_MARKERID")
@Field(store=Store.YES)
private long mkrMarkerid;
@Column(name="MKR_LAT", nullable = true)
@Field(store=Store.YES)
@NumericField
private Double mkrLat;
@Column(name="MKR_LONG", nullable = true)
@Field(store=Store.YES)
@NumericField
private Double mkrLong;
@Field – tells Hibernate to create a matching name-value pair in the search index for this
entity class
Store.YES – stores the value for retrieval directly from the index, without touching the
database
Annotated class example …@Indexed
@Entity
@Cacheable(true)
@Table(name="MARKER", schema="MAPLINK")
public class Marker extends MarkerA implements Serializable {
@Id
@Column(name="MKR_MARKERID")
@Field(store=Store.YES)
private long mkrMarkerid;
@Column(name="MKR_LAT", nullable = true)
@Field(store=Store.YES)
@NumericField
private Double mkrLat;
@Column(name="MKR_LONG", nullable = true)
@Field(store=Store.YES)
@NumericField
private Double mkrLong;
@NumericField – index as a numeric value, enables greater
than / less than / range searches
Let’s take a Luke at the index …
Practical search exercise
Questions!