solr pattern

51
A PATTERN FOR IMPLEMENTING SOLR 1 1

Upload: opensource-connections

Post on 11-May-2015

2.589 views

Category:

Technology


4 download

DESCRIPTION

A pattern for implementing search where search is a business problem, not a technological problem, especially if you've chosen Solr!

TRANSCRIPT

Page 1: Solr pattern

A PATTERN FOR IMPLEMENTING SOLR

1

1

Page 2: Solr pattern

BOTTOM LINE UP FRONT

• Migrating from an existing search architecture to the Solr platform is less an exercise in technology and coding, and more an exercise in project management, metrics, and managing expectations.

2

Page 3: Solr pattern

• “Typically smart people, fed into the search migration project meat grinder, produce hamburger quality results.  Okay search, with okay relevance, and an okay project.  But if you apply this pattern, you'll get back steak!”   - Arin Sime

3

Page 4: Solr pattern

Project definition We Start Here

Precursor Work

Prototype Typical starting point for technology driven team

Implementation

Testing/QA repeats!

Deployment

Ongoing Tuning Forgotten phase for a technology driven team

I want feedback!

4

Page 5: Solr pattern

PROGRAMMERS DOMINATE

• We dive right into writing indexers and building queries

• We skip the first two phases!

• We don’t plan for the last phase!

5

Page 6: Solr pattern

NEED HETEROGENOUS SKILLS

• More so than regular development project, we need multiple skills:

• Business Analysts

• Developers

• QA/Testers

• Report Writers

• Big Brain Scientists

• Content Folks (Writers)

• End Users

• UX Experts

• Ops Team

• Librarians!

6

Page 7: Solr pattern

PHASE 1: PROJECT DEFINITION

• Well understood part of any project right?

• objectives, key success criteria, evaluated risks

• Leads to a Project Charter :

• structure, team membership, acceptable tradeoffs

7

Page 8: Solr pattern

CHALLENGES• Competing business stakeholders:

• Tester : When I search for “lamp shades”, I used to see these documents, now I see a differing set.

• Business Owner: How do I know that the new search engine is better?

• User: My pet feature “search within these results” works differently.

• Marketing Guy: I want to control the results so the current marketing push for toilet paper brand X always shows up at the top.

8

Page 9: Solr pattern

CHALLENGES

• Stakeholders want a better search implementation, but perversely often want it to all work “the exact same way”.   Getting agreement across all the stakeholders for the project vision, and agree on the metrics is a challenge.

9

Page 10: Solr pattern

CHALLENGES

• Can be difficult to bring in non technical folks onto Search Team.

• Have a content driven site? You need them to provide the right kind of content to fit into your search implementation!

10

Page 11: Solr pattern

ENSURING SKILLS NEEDED

• Search is something everybody uses daily, but is it’s own specialized domain

• Solr does pass the 15 minute rule, don’t get over confident!

11

Page 12: Solr pattern

PERFECT SOLR PERSON WOULD BE ALL OF

• Mathematician

• Librarian

• UX Expert

• Writer

• Programmer

• Business Analyst

• Systems Engineer

• Geographer!

• Psychologist

12

Page 13: Solr pattern

KNOWLEDGE TRANSFER

• If you don’t have the perfect team already, bring in experts and do domain knowledge transfer.

• Learn the vocabulary of search to better communicate together

• “auto complete” vs “auto suggest”

• Do “Solr for Content Team” brownbag sessions!

13

Page 14: Solr pattern

14

Page 15: Solr pattern

HAVE A COOL PROJECT NAME!

15

Page 16: Solr pattern

PROJECT LIMELIGHT

“Putting our content in the lime light”

16

Page 17: Solr pattern

PHASE 2: PRECURSOR WORK

• A somewhat tenuous phase, this is making sure that we can measure the goals defined in the project definition.

• Do we have tools to track “increase conversions through search”?

• In a greenfield search, we don’t have any previous relevancy/recall to measure against, but in a brownfield migration project we can do some apples to (apples? oranges?) comparisons.

17

Page 18: Solr pattern

METRICS

18

Page 19: Solr pattern

DATA COLLECTION

• Have we been collecting enough data about current search patterns to measure success against?

• Often folks have logs that record search queries but are missing crucial data like number of results returned per query!

19

Page 20: Solr pattern

RELEVANCY

• Do we have any defined relevancy metrics?

• Relevancy is like porn.....

20

Page 21: Solr pattern

I KNOW IT WHEN I SEE IT!

http://en.wikipedia.org/wiki/Les_Amants

21

Page 22: Solr pattern

22

Page 23: Solr pattern

MEASURE USER BEHAVIOR

• Are we trying to solve user interaction issues with existing search?

• Do we have the analytics in place? Google Analytics? Omniture?

23

Page 24: Solr pattern

POGOSTICKINGimage from http://searchpatterns.org/

24

Page 25: Solr pattern

THRASHINGimage from http://searchpatterns.org/

25

Page 26: Solr pattern

BROAD BASE OF SKILLS

• Not your normal “I am a developer, I crank out code” type of tasks!

26

Page 27: Solr pattern

INVENTORY USERS

• Search often permeates multiple systems... “I can just leverage your search to power my content area”

• Do you know which third party systems are actually accessing your existing search?

• A plan for cutting the cord on an existing search platform!

Users as in “Systems”!

27

Page 28: Solr pattern

PHASE 3: PROTOTYPE

• The fun part! <-- Why tech driven teams start here!

• Solr is very simple and robust platform.

• Most time should be spent on defining the schema needs to support the search queries, and indexing the correct data

28

Page 29: Solr pattern

GOING FROM QUESTIONS TO ANSWERS

29

Page 30: Solr pattern

INDEXING: PUSH ME PULL ME

• Are we in a pull environment?

• DIH

• Crawlers

• Scheduled Indexers

• Are we in a push environment?

• Sunspot

30

Page 31: Solr pattern

VERIFY INDEXING STRATEGY

• Use the complete dataset, not a partial load!

• Is indexing time performance acceptable?

• Quality of indexed data? Duplicates? Odd characters?

31

Page 32: Solr pattern

WHERE IS SEARCH BUSINESS LOGIC?

• Does it go Solr side in request handlers (solrconfig.xml?)

• Is it specified as lots of URL parameters?

• Do you have a frontend library like Sunspot that provides a layer of abstraction/DSL?

32

Page 33: Solr pattern

HOOKING SOLR UP TO FRONTEND

• The first integration tool may not be the right one!

• A simple query/result is very easy to do.

• A highly relevant query/result is very difficult to do.

33

Page 34: Solr pattern

PART OF PROTOTYPING IS DEPLOYMENT

• Make sure when you are demoing the prototype Solr, its been deployed into an environment like QA

• Running Solr by hand on a developer’s laptop is NOT enough.

• Figuring out deployment (configuration management, environment, 1-click deploy) need to be at least looked at

34

Page 35: Solr pattern

PHASE 4: IMPLEMENTATION

• Back on familiar ground! We are extending the data being indexed, enhancing search queries, adding features.

• Apply all the patterns of any experienced development team.

• Just don’t forget to involve your non techies in defining approaches!

35

Page 36: Solr pattern

INDEXERS PROLIFERATE!

• Make sure you have strong patterns for indexers

• A good topic for a code review!

36

Page 37: Solr pattern

PHASE 5: TESTING/QA

• Most typical testing patterns apply EXCEPT

• Can be tough to automate testing if data is changing rapidly

• You want the full dataset at your finger tips

• You can still do it!

37

Page 38: Solr pattern

WATCH OUT FOR RELEVANCY!

• Sometimes seems like once you validate one search, the previous one starts failing

• How do you empirically measure this?

• Need production like data sets during QA

• Don’t get tied up in doc id 598 is the third result. Be happy 598 shows up in first 10 results!

38

Page 39: Solr pattern

EXPLORATORY TESTING?

• ...simultaneous learning, test design and test execution

• Requires tester to understand the corpus of data indexed

• behave like a user

http://en.wikipedia.org/wiki/Exploratory_testing

James Bach

39

Page 40: Solr pattern

STUMP THE CHUMP

• You can always write a crazy search query that Solr will barf on... Is that what your users are typing in?

40

Page 41: Solr pattern

DOES SOLR ADMIN WORK?

• Do searches via Solr Admin reflect what the front end does? If not, provide your own test harness!

• Make adhoc searches by QA really really easy

• “Just type these 15 URL params in!” is not an answer!

41

Page 42: Solr pattern

PHASE 6: DEPLOYMENT

• Similar to any large scale system

• Network plumbing tasks, multiple servers, IP addresses

• Hopefully all environment variables are external to Solr configurations?

• Think about monitoring.. Replication, query load!

42

Page 43: Solr pattern

DO YOU NEED UPTIME THROUGH RELEASE?

• Solr is both code, configuration, and data! Do you have to reindex your data?

• Can you reindex your data from someplace else?

43

Page 44: Solr pattern

44

Page 45: Solr pattern

PRACTICE THIS PROCESS!

• mapping out the steps to backup cores, redeploy new ones, update master and slave servers is fairly straightforward if done ahead of time

• These steps are a great thing to involve your Ops team in

45

Page 46: Solr pattern

PHASE 7: ONGOING TUNING

• The part we forget to budget for!

• Many knobs and dials available to Solr, need to keep tweaking them as:

• data set being indexed changes

• as behavior of users changes

46

Page 47: Solr pattern

HAVE REGULAR CHECKINS WITH CONTENT PROVIDERS

• Have an editorial calender of content? Evaluate what synonyms you are using based on content

• Can you better highlight content using Query Elevation to boost certain documents?

47

Page 48: Solr pattern

QUERY TRENDS

• Look at queries returning 0 results

• are queries getting slower/faster

• are users leveraging all the features available to them

• Does your analytics highlight negative behaviors such as pogosticking or thrashing?

• AUTOMATE THESE REPORTS!

48

Page 49: Solr pattern

Less than 0.5 s69%

0.5-1.0s20%

1.0-1.5s6%

1.5-2.0s2%2.0-2.5s

1%> 2.5s

2%Query Duration

89% of all queries take less than 1s

49

Page 50: Solr pattern

Over time, we want to see this trendbecome steeper, which would indicatequeries are becoming shorter and more noticeable performance improvements

Note: It’s harder to get queries in that 0-0.1s range, thoughIt is questionable if focusing on that leads to noticeable improvement

50

Page 51: Solr pattern

Project definition Start!

Precursor Work

Prototype

Implementation

Testing/QA repeats!

Deployment

Ongoing Tuning Maximize value of investment

51