enterprise search with coldfusion solr

53
Enterprise Search with ColdFusion Solr Dan Sirucek cf.Objective 2012 May 2012

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enterprise Search with ColdFusion Solr

Enterprise Search with ColdFusion Solr

Dan Sirucek cf.Objective 2012 May 2012

Page 2: Enterprise Search with ColdFusion Solr

2

About Me

• Senior Learning Technologist at WellPoint, Inc • Developer for 14 years • Developing in ColdFusion for 8 years • Started in SQL Server, ASP, ASP.NET, VB.NET • Also work in Flash Builder/Flex, Java, and C#

Page 3: Enterprise Search with ColdFusion Solr

3

Where We’ve Been: Growth and Consolidation

WellPoint, Inc. was formed in 2004 as the result of a merger between Anthem, Inc. and WellPoint Health Networks, creating

the nation’s largest health benefits company by membership

Page 4: Enterprise Search with ColdFusion Solr

4

Where We Are: National Scale

1 out of 9 Americans are covered by WellPoint’s affiliated health plans

Note: Provider Network refers to BlueCard® PPO Network

• Nation’s Largest Insurer • ~34 million medical members

• Total Revenue • Nearly $60 billion

• Provider Network Advantage • ~94% Hospitals • ~82% Primary Physicians • ~84% Specialists

• Blue Licensee • 14 states

Page 5: Enterprise Search with ColdFusion Solr

5

Agenda

• Problem and Goal • Why Apache Solr for ColdFusion 9.01 • Solr Multi-core Overview • Replication Overview • Installation • Replication Configuration • Managing Collections on Multiple Solr Instances • Extending ColdFusion Solr Schema • Creating a Custom Search • Q & A • Resources

Page 6: Enterprise Search with ColdFusion Solr

6

Problem and Goal

• Problem • Slow search response

• Constant corruption issues

• Verity wasn’t scalable

• No redundancy

• Goal • Improve search response

• Create an enterprise scalable solution

• Implement redundancy for high availability

• Maintain compatibility with <cfsearch /> & <cfindex /> tags

Page 7: Enterprise Search with ColdFusion Solr

7

Why Apache Solr for ColdFusion 9.01

• Performance

• Fast, very fast

• Optimized for high volume web traffic

• Scalable

• Distributed searches

• Replication • Redundancy

• Replication supports • Master • Slave • Repeater

Page 8: Enterprise Search with ColdFusion Solr

8

Solution Architecture

Page 9: Enterprise Search with ColdFusion Solr

9

Technologies Used

• Windows Server 2008 64 bit • IIS 7.0 • Application Request Routing • ColdFusion 9.01 Multi-server • Apache Tomcat 6

• Master instance

• Apache Solr Standalone Installation for ColdFusion 9.01 • Slave instances

• Java SE JDK 1.6_026 64-bit

Page 10: Enterprise Search with ColdFusion Solr

10

Solr Multi-core Overview

• Solr core = ColdFusion collection • Multiple Cores

• Single Solr instance • Each Solr core has its own configuration and index • Unified administration

• Multi-core template • A template is used for creating a new core (collection)

• The template contains a directory structure and the configuration files needed to create a new core

• Location SolrInstallationDirectory\multicore\template

Page 11: Enterprise Search with ColdFusion Solr

11

Solr Multi-core Template

• conf directory • Contains configuration files used when creating a new Solr core

• Two key files: schema.xml

– Contains the details about which fields your index can contain – How those fields should be dealt with when adding documents to the

index – How those fields should be dealt with when querying those fields

solrconfig.xml – Contains the configuration settings for the Solr core – Used to configure replication

Page 12: Enterprise Search with ColdFusion Solr

12

Solr Multi-core Template Continued

• conf directory continued • Files referenced by schema.xml:

protwords.txt – Words that need protection from stemming – i.e. “maine” is stemmed to “main”

stopwords.txt – Words to not index e.g. a, an, and

synonyms.txt – Synonym groups e.g. GB,gib,gigabyte,gigabytes – Mappings used for spelling corrections e.g. hippa => hipaa

Page 13: Enterprise Search with ColdFusion Solr

13

Solr Multi-core Template Continued

• conf directory continued • Optional file:

solrcore.properties – User defined properties to be referenced within solrconfig.xml – Syntax – Property=Value – File is referenced by default when present in conf directory – Example:

• data directory • Empty directory

• Solr will create the following directories the 1st time content is indexed index spellindex

Page 14: Enterprise Search with ColdFusion Solr

14

Solr Replication Overview

• Replication Features • Efficient and automated distribution of index additions, updates, and

deletions • Pull strategy allows for easy addition of slaves • Configurable distribution interval allows tradeoff between timeliness and

cache utilization - interval is set by the slave instance • Replication and automatic reloading of configuration files • Works over HTTP • Works across platforms with same configuration

• Replication Modes • Master – optimized for indexing • Slave – optimized for searches • Repeater – used in WAN to reduce bandwidth between data centers

Page 15: Enterprise Search with ColdFusion Solr

15

Solr Replication Considerations & Challenges

• Considerations • Replication is not a server level configuration

• Replication is configured in at the solr core (search collection) level

• New cores need to be created on all solr instances

• Challenges • Modify the multi-core template to implement replication when new cores

are created

• Automate the creation of a solr core on all solr instances

• Create a consolidated view of cores on all instances

Page 16: Enterprise Search with ColdFusion Solr

16

Solr Replication Requirements

• Basic Requirement • One master solr instance

• One or more slave solr instances

• Configuration of replication request handlers on master and slave instances

• Replication Request Handler • Configuration is handled in the solrconfig.xml

• Replication is defined by adding a request handler using XML syntax

• Settings are used to set the properties for the request handler

• Master and slave instances are both configured using a request handler, but use different attributes to define its role

Page 17: Enterprise Search with ColdFusion Solr

17

Master Replication Request Handler

• Replication request handler with all possible attributes • Screen shot

Page 18: Enterprise Search with ColdFusion Solr

18

Required Master Settings

• replicateAfter • Configures when replication will be triggered

• Valid values: startup, commit, optimize

• If using startup option, it is necessary to have a commit/optimize entry also, if you want to trigger replication on future commits/optimizes.

• Example:

Page 19: Enterprise Search with ColdFusion Solr

19

Recommended Master Settings

• confFiles • Used to specify configuration files to be replicated

• Comma delimited list of files to replicate

• Can be configured to rename files on replication Syntax – source_file_name.xml:destination_file_name.xml

• Example:

Page 20: Enterprise Search with ColdFusion Solr

20

Optional Master Settings

• backupAfter • Configures when a backup will be created

• Valid values: optimize, startup, commit

• maxNumberOfBackups • Maximum number of backups to retain

• commitReserveDuration • Default 10 seconds

• If commits are very frequent and network is slow, you can tweak this value

Page 21: Enterprise Search with ColdFusion Solr

21

Slave Replication Request Handler

• Slave replication request handler with all possible settings • Add screen shot and high level notes

Page 22: Enterprise Search with ColdFusion Solr

22

Required Slave Settings

• Configuration file • solrconfig.xml

• masterUrl • Sets the url of the Solr master instance • ${solr.core.name} – system variable

• pollInterval • Sets the polling interval of the slave to poll the master for changes • Considerations

Frequency of updates to index Network Bandwidth Acceptable latency

Page 23: Enterprise Search with ColdFusion Solr

23

Optional Slave Settings

• httpConnTimeout • Sets connection timeout on the underlying HttpConnectionManager • Default value 5000ms

• httpReadTimeout • Sets timeout when fetching index from master • Default value 10000ms

• httpBasicAuthUser • Use if basic authentication is enabled on master

• httpBasicAuthPassword • Use if basic authentication is enabled on master

• Compression • Use only if your bandwidth is low

Page 24: Enterprise Search with ColdFusion Solr

24

Slave Replication Configuration Examples

• Basic configuration example

• Using solrcore.properties configuration example

Page 25: Enterprise Search with ColdFusion Solr

25

Slave Solr Installation

• Slave Servers • Windows Server 2008 (64 bit 8gb ram)

• Install Java SE JDK 1.6_026 64-bit Note location of installation directory

– Example : D:\Apps\Java\jdk1.6.0_26

• Execute Apache Solr Standalone Installation for ColdFusion 9.01 installer Change Java Home from default to:

javaInstallationDirectory\jdk1.6.0_26\jre – Example: D:\Apps\Java\jdk1.6.0_26\jre

Page 26: Enterprise Search with ColdFusion Solr

26

Master Solr Installation

• Master Solr Server • Windows Server 2008 (64 bit 8gb ram)

• Download Java JDK1.6_026 64-bit

• Download Apache Tomcat 6 32-bit/64-bit Windows Service Installer

• Execute Java JDK Installer Note installation directory Example: E:\Apps\java

• Execute the Tomcat 6 installer Java JRE – specify the jre in the jdk 1.6.0_26 installation

– Example: E:\Apps\Java\jdk1.6.0_26\jre Select installation directory

– Example: E:\Apps\tomcat6

Page 27: Enterprise Search with ColdFusion Solr

27

Master Solr Installation Continued

• Master Solr Installation continued • Create a solr directory – example E:\Apps\solr

• Copy the following from slave installation solr.war to solr directory

– installationDirectory\webapps\solr.war Mutli-core directory to solr directory

– installationDirectory\mutlicore

• Configure Tomcat service • Launch Configure Tomcat

• Java tab

• Set initial memory pool

• Set maximum memory pool

Page 28: Enterprise Search with ColdFusion Solr

28

Configure Tomcat for Solr

• Stop Apache Tomcat 6 service • Create solr context

• A Context is what Tomcat calls a web application • Location: tomcatInstallDir\conf\Catalina\localhost\ • Create a solr.xml file • Edit solr.xml and define Solr context • Example:

• Start Apache Tomcat 6 service • Launch Tomcat 6 - http://127.0.0.1:8080/manager/html • Navigate to solr application

Page 29: Enterprise Search with ColdFusion Solr

29

Tomcat 6 Web Application Manager

Page 30: Enterprise Search with ColdFusion Solr

30

Slave Configuration

• Apache Solr for ColdFusion 9.01 runs on a Jetty servlet • Jetty Configuration

• Configuration file location SolrInstallationDirectory\etc\jetty.xml

• Connector system properties jetty.port – default = 8983 jetty.host – default = not defined

• Default configuration listens only on 127.0.0.1

• Add jetty.host system property to the connector setting 0.0.0.0 = listen on all IPs Example:

Page 31: Enterprise Search with ColdFusion Solr

31

Slave Jetty Configuration Continued

• Default connector configuration

• After update

Page 32: Enterprise Search with ColdFusion Solr

32

Slave Service Configuration

• Service start up configuration • Default java ram maximum memory setting is 256mb

InstallationDirectory\solr.lax

• Adjust maximum memory setting -Xmx

• Add a minimum memory setting -Xms

• Example:

Page 33: Enterprise Search with ColdFusion Solr

33

Master Solr Multi-core Template Configuration

• Create solrcore.properties • Create a text file named solrcore.properties in the Solr multicore template

directory

• Add two properties MASTER_CORE_URL=http://masterHostnameUrl:masterPort/solr POLL_TIME=hh:mm:ss

• Example:

• Create solrconfig_slave.xml • Make a copy of solrconfig.xml in the master Solr multicore template

directory

• Name the file solrconfig_slave.xml

Page 34: Enterprise Search with ColdFusion Solr

34

Master Solr Multi-core Template Configuration Continued

• Configure solrconfig.xml for replication • Add master and slave replication request handlers • solrconfig.xml

• solrconfig_slave.xlm

Page 35: Enterprise Search with ColdFusion Solr

35

Slave Solr Multi-core Template Configuration

• solrcore.properties • Copy solrcore.properties in template/conf directory on master to

template/conf directory on slave

• solrconfig.xml • Delete solrconfig.xml file in template/conf on slave

• Copy solrconfig_slave.xml in template/conf directory on master to template/conf directory on slave

• Rename solrconfig_slave.xml to solrconfig.xml on slave

Page 36: Enterprise Search with ColdFusion Solr

36

Creating New Collections

• Collections (cores) need to be created on all Solr instances • Use Solr API to create new cores

• REST-like API

• Create new core parameters action – CREATE name – name of new core instanceDir – directory path for new instance template – directory path for the core template wt – writer type

– Format of response – Options: json, javabin, xml – Default = xml

version = 1

Page 37: Enterprise Search with ColdFusion Solr

37

Creating New Collections Code

• In CF create an array of server instances • Define collection name

Page 38: Enterprise Search with ColdFusion Solr

38

Creating New Collections Code Continued

• Loop over server instance array • Create collection on each instance

Page 39: Enterprise Search with ColdFusion Solr

39

Collection Create Result Struct

• De-serialized file content (cfdump from previous slide) • core – collection name

• responseHeader QTime – query time milliseconds status

• saved File path to multicore\solr.xml multicore\solr.xml file is used to store

core names and instance directory

Page 40: Enterprise Search with ColdFusion Solr

40

Solr Admin Master Replication

• Core admin • Navigate to Replication

• Replication admin • Index version

• Location

• Size

Page 41: Enterprise Search with ColdFusion Solr

41

Solr Admin Slave Replication

• Core admin • Navigate to Replication

• Replication admin • Master

• Poll Interval

• Local Index Version & location Replication status

• Controls Disable Poll Replicate Now

Page 42: Enterprise Search with ColdFusion Solr

42

Deleting Collections

• Collections (cores) should be deleted from all Solr instances • Use Solr API to delete cores

• Delete core parameters action – UNLOAD core – name of core to delete wt – writer type

– Format of response – json, javabin, xml – Default = xml

version = 1

Page 43: Enterprise Search with ColdFusion Solr

43

Delete Collections Code

• Loop over server instance array • Delete collection on each instance

Page 44: Enterprise Search with ColdFusion Solr

44

Extend ColdFusion Solr Schema (cfcore)

• Reasons to extend/change default functionality • Change default operator

The default is OR

• Enable delete by key capability

• Enable case sensitivity on search

• Possible changes to schema.xml • Default operator between words is OR

Changing default operator to AND will reduce number of results

Page 45: Enterprise Search with ColdFusion Solr

45

Extend ColdFusion Solr Schema – Enable Delete by Key

• Enable delete by key • Default unique key is a system generated identifier • Possible use case

Use API to delete indexed content by the key value • Changes

Create a copy of schema.xml and name it schema_slave.xml Update replication conf attribute to use schema_slave.xml: schema.xml Changes to schema.xml

– Change index attribute on key field to true

– Change unique key from uid to key

Changing unique key on slave instances will break cfsearch tag

Page 46: Enterprise Search with ColdFusion Solr

46

Extend ColdFusion Solr Schema – Enable case sensitivity on search

• Enable case sensitivity on search • Default configuration uses a filter to change text to lower case

• Possible use case Search by title and retain case sensitivity

• Schema Change Comment out solr.LowerCaseFilterFactory

Page 47: Enterprise Search with ColdFusion Solr

47

Creating a Custom Search

• Use case • Return category facet counts • Date range search

• Solr Search API • Basic query parameters

q – search query fq – facet query qt – query type – name of the request handler in solrconfig.xml start – start row rows – number of rows to return in response fl – comma delimited list of fields to include in response wt – write response type

Page 48: Enterprise Search with ColdFusion Solr

48

Creating a Custom Search Continued

• Solr Search API continued • Highlight parameters

hl – enable highlighted snippets to be generated hl.fragsize – the size in characters, of the snippets created by highlighter hl.snippets – maximum number of snippets to generate per field hl.simple.pre – text which appears before highlighted term hl.simple.post – text which appears after highlighted term

• Facet parameters facet – enable facet counts in query response facet.field – specify a field which should be treated as a facet facet.mincount - minimum count to include facet in response

Page 49: Enterprise Search with ColdFusion Solr

49

Creating a Custom Search Continued

• JSON specific parameter • json.nl

Controls the output format of NamedList used for field faceting data flat (default) – flat array

– Example: [name1,val1, name2,val2] map – JSON object

– Is a hash and can have repeated keys, but preserves order arrarr – an array of two element arrays

– Example: [[name1,val1], [name2, val2], [name3,val3]]

Page 50: Enterprise Search with ColdFusion Solr

50

Creating a Custom Search Code

• Code Review

Page 51: Enterprise Search with ColdFusion Solr

51

Custom Search User Interface Example

Page 52: Enterprise Search with ColdFusion Solr

52

Q & A

21555 Oxnard Dr Dan Sirucek MS: CAAC08-088I Sr. Learning Technologist Woodland Hills, CA 91316 Learning Technologies and Tel (818) 234-8017 Content Mobile (323) 251-1236 www.wellpoint.com [email protected]

Page 53: Enterprise Search with ColdFusion Solr

53

Resources

• Apache Tomcat 6 - http://tomcat.apache.org/download-60.cgi

• Apache Solr Standalone Installer for ColdFusion 9.0.1 - http://www.adobe.com/support/coldfusion/downloads.html

• Java JDK 1.6_26 download- http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u26-download-400750.html

• Apache Solr - http://lucene.apache.org/solr/

• Solr Wiki - http://wiki.apache.org/solr/FrontPage

• Solr Replication - http://wiki.apache.org/solr/SolrReplication

• Solr JSON Response Writer - http://wiki.apache.org/solr/SolJSON#JSON_Query_Response_Format

• Solr Facet Parameters - http://wiki.apache.org/solr/SimpleFacetParameters

• Solr Highlighting Parameters - http://wiki.apache.org/solr/HighlightingParameters