do some solr searching

33
Do Some SOLR Searching in .NET DAVID HOERSTER

Upload: david-hoerster

Post on 26-Jan-2015

109 views

Category:

Technology


3 download

DESCRIPTION

You've designed your application, built it up, and it's working great. One of the last features to implement is searching and reporting. You've put it off because you really don't want to deal with SQL Server Full-Text Indexing - maybe it's not your cup of tea or maybe it's just intimidating. But there are alternatives to Full-Text Indexing that can be just as powerful and fairly simple. SOLR is one such tool to help you with your application's searching needs. We'll take a look at the SOLR product, how you can get it up and running very easily, how you can install it as a Windows Service (as opposed to a command window), and how you can use SOLR.net to program against it.

TRANSCRIPT

Page 1: Do Some Solr Searching

Do Some SOLR Searching in .NETDAVID HOERSTER

Page 2: Do Some Solr Searching

About Me C# MVP (Since April 2011)

Director of Web Solutions at RGP

Co-Founder of BrainCredits (braincredits.com)

Conference Director for Pittsburgh TechFest

Past President of Pittsburgh .NET Users Group

Organizer of recent Pittsburgh Code Camps and other Tech Events

Twitter - @DavidHoerster

Blog – http://geekswithblogs.net/DavidHoerster

Email – [email protected]

Page 3: Do Some Solr Searching

Take Aways What is SOLR When You May Use SOLR How to Integrate SOLR in a .NET Application Strategies for Managing RDBMS and SOLR transactions

Page 4: Do Some Solr Searching

Agenda Searching in Apps Hello SOLR Installing and Running SOLR Admin Interface Using SOLR in .NET

◦ Retrieving Data◦ Modifying Collections◦ Interesting Features

◦ Highlighting, Snippets, Facets

Page 5: Do Some Solr Searching

Searching in Applications

Page 6: Do Some Solr Searching

Searching in Applications How do we accomplish these?

◦ Stored Procs?◦ Bunch of LIKE’s?◦ SQL Server Full-Text?◦ Something else?Lots of solutionsSOLR could be one

Page 7: Do Some Solr Searching

SOLR Open Source

Search Service Platform

Built on Lucene

Provides a number of features, such as◦ Full Text Indexing◦ Hit Highlighting◦ Faceted Searches◦ Clustering and Replication

HTTP REST-like interface, providing results in JSON, XML, CSV, and other formats

Written in Java and runs within the JVM

Page 8: Do Some Solr Searching

Why Use SOLR? Small application or prototype environment Mixed environment or maybe non-SQL Server environment NoSQL usage that doesn’t have full-text indexing Features required such as faceted search, highlighting, more-like-this Extensible search features and data types

Page 9: Do Some Solr Searching

SOLR Deployment (Basic)

Application (e.g. web

server – port 80)

SOLR Service (port 8983)

Client

Application may or may not connect directly with SOLR

SOLR Service runs within JVM

Usually not best to publicly expose SOLR

HTTP

Page 10: Do Some Solr Searching

Some Things to Remember SOLR does not have authentication built in

◦ Treat it as a service to your application◦ Do not expose externally unless you want the world to search

SOLR is not a document database in the league of MongoDB◦ Some NoSQL features◦ Flat structures (MongoDB has some depth)◦ Some examples use SOLR like a DB…

◦ More for expedience and simplicity◦ Not a recommendation

Page 11: Do Some Solr Searching

My Implementation of SOLR

Web ClientWeb

Server (PHP)

SOLR Instance

Content Database

(postgreSQL / SQL

SOLR Indexer (.NET)

GIT Repository

Fetch

Get

Create / Update

Search

Get

Internal NetworkPublic Internet

HTTP

Remote Repo

Page 12: Do Some Solr Searching

Installing SOLR Very simple to quickly get up and running

Assumes you have JRE installed

Download SOLR from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html?

Extract ZIP file to a directory of your choice◦ I chose C:\SOLR as my SOLR root

From a command prompt, navigate to the examples directory and start the Jetty server◦ cd c:\solr\4.4.0\examples◦ java -jar start.jar

That’s it – SOLR is ready to go!

Default “collection1” core is set up (but you’ll probably want to delete it)

Page 13: Do Some Solr Searching

SOLR Administration Interface Admin UI available out of the box

Check status

Add/Remove Cores

Issue Queries

Check Logs

Modify Schemas

Lots more!

Page 14: Do Some Solr Searching

SOLR Collections, Schemas and Documents

Collection is a group of similar items◦ Like a table in SQL

Document is a single item in a collection◦ Defines an item to be searched◦ Contains fields◦ Document is like a SQL row

Fields are individual properties of a document◦ Like a SQL column◦ Has a type and a value

Schema defines the structure of documents in a collection◦ Defines fields, types, keys, dynamic fields and copy rules

Schema basic structure:

<schema>   <types>   <fields>      <uniqueKey>   <defaultSearchField>   <solrQueryParser defaultOperator>   <copyField> </schema>

Page 15: Do Some Solr Searching

Document Fields Area in schema most likely to alter Various data types available built-in

◦ int, float, string, date, …

Fields have a number of properties◦ can be single or multi-valued◦ fields like ‘text’ are great for concatenating fields together for aggregated searching◦ you can choose to index a field, store the field value, or both

<field name="lahmanId" type="int" indexed="true" stored="true" required="true" multiValued="false" />

Page 16: Do Some Solr Searching

Querying Let’s Get Some Data!! SOLR is based on Inverted Index concept

◦ Instead of ID’s mapped to entries, words are mapped to ID’s.◦ Analyzers then traverse inverted index and evaluates relevance

Admin UI provides a quick and dirty interface to retrieve data Most query options available Can also specify format Once parameters issued, URL is available as reference

Page 17: Do Some Solr Searching

Querying Basic parameter is `q`

◦ http://localhost:8983/solr/<collection>?q=<field>:<value>

Other basic parameters include:◦ Query Fields (qf) – selects the fields to return◦ Sorting (sort) – specifies the fields to sort on and direction◦ Row Offset (start) – which row to start with when returning results (default is

0)◦ Caching (cache) – tells SOLR whether to cache the results (default is true)◦ Rows to return (rows) – how many rows to return in the call (default is 10)

These are all query string parameters.

Page 18: Do Some Solr Searching

DemoUSING THE ADMIN INTERFACE

Page 19: Do Some Solr Searching

Working with SOLR in .NET solrnet library

◦ https://code.google.com/p/solrnet/◦ Source: https://github.com/mausch/SolrNet/tree/master/SolrNet

WARNING: If you’re using SOLR 4+◦ Committing in solrnet will throw an error◦ Need to download latest code from GitHub and compile

◦ Or download a package’s code and remove the initialization of the waitFlush property from solr/commands/parameters/CommitOptions.cs

Page 20: Do Some Solr Searching

Set Up Typed Entities     public class Quote{        [SolrUniqueKey("id")]         public String Id { get; set; }

        [SolrField("title")]         public String Title { get; set; }

        [SolrField("articleBody")]         public String ArticleBody { get; set; }

        [SolrField("year")]         public Int32 Year { get; set; }

        [SolrField("abstract")]         public String Abstract { get; set; }

        [SolrField("source")]         public String Source { get; set; }     }

Page 21: Do Some Solr Searching

Initializing solrnet Startup.Init<Quote>("http://localhost:8983/solr/historicalQuotes");

Startup.Init<Hitter>("http://localhost:8983/solr/baseball");

ISolrOperations<Quote> _solr = 

ServiceLocator.Current.GetInstance<ISolrOperations<Quote>>();

Uses Microsoft p&p’s ServiceLocator class to get SOLR instance

Page 22: Do Some Solr Searching

Issuing a QueryBasic query, as it selects everything:

var quotes = _solr.Query(new SolrQuery("*:*"));

Returns just those records with an id of 12345:var quotes = _solr.Query(new SolrQuery(“id:12345”));

Searches for specific text, and only returns 3 fields:var query = new SolrQuery("text:" + id); var options = new QueryOptions() {

Fields = new[] { "id", "title", "source" }};

var results = _solr.Query(query, options);

Page 23: Do Some Solr Searching

Filter Queries ‘fq’ parameter

Runs the filter against the entire index and caches the results

Can help speed up searching if you know of common, recurring searches

In solrnet, use the FilterQueries QueryOption

_solr.Query(“*:*”, new QueryOptions { FilterQueries = new ISolrQuery[] { new SolrQueryByField(“HR”, “[50 TO *]”), … }}

Page 24: Do Some Solr Searching

Modifying Data in SOLR Using the existing SOLR instance to perform an insert…

_solr.Add(theQuote); _solr.Commit();

Use the same instance to perform an update…

      _solr.Add(theQuote);       _solr.Commit();       _solr.Optimize();

Commit writes your changes to SOLR’s index Optimize rebuilds the index◦ More expensive◦ Be mindful when called

Page 25: Do Some Solr Searching

Search Features (Query Options) Highlighting   Highlight = new HighlightingParameters() {                     Fields = new[] { "articleBody", "abstract" },                     Fragsize = 200,                     AfterTerm = "</em></strong>",                     BeforeTerm = "<em><strong>",                     UsePhraseHighlighter = true                     //, AlternateField = "source"                 }

More Like This   MoreLikeThis = new MoreLikeThisParameters(

new[] { "articlebody", "source" })                 { MinDocFreq = 1, MinTermFreq = 1 }

Page 26: Do Some Solr Searching

Search Features (Query Options) Faceted Search   Facet = new FacetParameters() {                     Queries = FacetQueryCategories(minHomeRuns)         }

  private SolrFacetQuery[] FacetQueryCategories(Int32 minHomeRuns)  {      var salaryFacet1 =  new SolrQueryByRange<Int32>("salary", 0, 1000000);

... return new[] { salaryFacet1 };

}

Page 27: Do Some Solr Searching

DemoUSING SOLR FEATURES TO ENHANCE YOUR SEARCH EXPERIENCE

Page 28: Do Some Solr Searching

Handling the Distribution for Mods

Client Server

SOLR

RDBMS Send the modification to the RDBMS and to SOLR and hope for the best.

Pretty optimistic!

Page 29: Do Some Solr Searching

Handling the Distribution for Mods

Client Server

SOLR

RDBMSWrap the RDBMS call in a System.Transaction and Rollback if SOLR throws an exception.

Rollback if SOLR error

Check for error

More cautious

Page 30: Do Some Solr Searching

Handling the Distribution for Mods

Client Server

SOLR

RDBMS

Drop a command into a queue for a Command Handler to pick up.

Command Handler/Domain processes and raises Event which can end up in SOLR.

More complicated, but more reliable.

Queue

Command Handler

Persist Command

More Message Oriented (CQRS???)

Page 31: Do Some Solr Searching

SOLR as a Windows Service NSSM can install SOLR quickly

◦ Non Sucking Service Manager◦ http://nssm.cc/◦ Version 2.16

◦ Hasn’t been updated in a little while

Launch NSSM as administrator◦ nssm install SOLR

Java.exe is the executable

Command Line args are (specific to my install directory):◦ -Djetty.logs=C:/solr/logs/-Djetty.home=C:/solr/-Dsolr.solr.home=C:/solr/solr/ -cp C:/solr/lib/*.jar;C:/solr/start.jar -jar C:/solr/start.jar

Name the service and hit install. Done!

Page 32: Do Some Solr Searching

What’s Next Other query techniques

◦ Boosting◦ http://localhost:8983/solr/historicalQuotes/select/?defType=dismax&q=text&qf=source^20.0+text^0.3

◦ Spatial◦ Sounds like

SOLR Cloud◦ SOLR replication and sharding◦ Moving to the enterprise space

Extending SOLR Behaviors and Using Other Parsers Using Dynamic Properties Using SOLR in a full NoSQL Environment

Page 33: Do Some Solr Searching

Resources SOLR Reference Guidehttps://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

SOLR Tutorialhttp://lucene.apache.org/solr/4_4_0/tutorial.html

Nice SOLR Walk-Throughhttp://www.solrtutorial.com/

BooksApache Solr 4 Cookbook (Packt Publishing)Apache Solr 4 In Action (Manning – MEAP)