jazeed about solr - people as a search problem

50
About Solr People as A Search Problem Thursday, May 26, 2011

Upload: lucidworks

Post on 01-Nov-2014

1.875 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Jazeed about Solr - People as A Search Problem

About SolrPeople as A Search Problem

Thursday, May 26, 2011

Page 2: Jazeed about Solr - People as A Search Problem

About Me

• Building websites since 1996, Java since 1997

• Prior web search experience• Building and scaling eHarmony

products since 2002

Thursday, May 26, 2011

Page 3: Jazeed about Solr - People as A Search Problem

What is Jazzed

• Subscription Based Dating Site

• Incubated by eHarmony

Thursday, May 26, 2011

Page 4: Jazeed about Solr - People as A Search Problem

What is Jazzed

• Create a profile• Search for others• View their photos• Privately

Communicate

Thursday, May 26, 2011

Page 5: Jazeed about Solr - People as A Search Problem

What is Jazzed

• Create a profile• Search for others• View their photos• Privately

Communicate

Thursday, May 26, 2011

Page 6: Jazeed about Solr - People as A Search Problem

What is Jazzed

• Create a profile• Search for others• View their photos• Privately

Communicate

Thursday, May 26, 2011

Page 7: Jazeed about Solr - People as A Search Problem

What is Jazzed

• Create a profile• Search for others• View their photos• Privately

Communicate

Thursday, May 26, 2011

Page 8: Jazeed about Solr - People as A Search Problem

How is it different?

• Covers broader range of relationships• Easy to get started• Real profiles screened by machine and

humans• Fast, effective search oriented tools

Thursday, May 26, 2011

Page 9: Jazeed about Solr - People as A Search Problem

Jazzed Stats

• Started Fall 2009• Beta Summer 2010• Launched October 2010• 100,000s of Profiles• 1,000s of Searches Daily

Thursday, May 26, 2011

Page 10: Jazeed about Solr - People as A Search Problem

Jazzed Architecture

• Event-driven SOA• REST, JSON, EIP, Not-only-SQL• Technology incubation

Thursday, May 26, 2011

Page 11: Jazeed about Solr - People as A Search Problem

Tech Stack

• Java 6, Spring 3, Jersey 1.1, JMS (AQMP)

• RHEL 4, Oracle 11g, Voldemort 0.81, Solr 1.4.1, NFS

Thursday, May 26, 2011

Page 12: Jazeed about Solr - People as A Search Problem

Thursday, May 26, 2011

Page 13: Jazeed about Solr - People as A Search Problem

Thursday, May 26, 2011

Page 14: Jazeed about Solr - People as A Search Problem

Not Covered

• Distributed Search• Caching Strategies• Data Import• Analyzers/Tokenizers

Thursday, May 26, 2011

Page 15: Jazeed about Solr - People as A Search Problem

Why Lucene?

• Proven Solid IR library• Prefer Open Source Solutions• Not Only SQL• Flexible Ranking • Pluggable

Thursday, May 26, 2011

Page 16: Jazeed about Solr - People as A Search Problem

Why Solr

• Performant, Extensible, RESTful Service• Configuration, Schema, Multicores• Admin Interface• Replication, Backups, Monitoring

Thursday, May 26, 2011

Page 17: Jazeed about Solr - People as A Search Problem

Open Source

• Strengthens Engineering Team• Be apart of great community• Not Brochure-ware

Thursday, May 26, 2011

Page 18: Jazeed about Solr - People as A Search Problem

Not Only SQL

• One solution does not fit all• Prefer availability over consistency• Horizontal Scaling over Vertical

Thursday, May 26, 2011

Page 19: Jazeed about Solr - People as A Search Problem

Flexible Ranking

• Query Strategies• Boolean Algebra• Vector Space Analysis• Hybrids

• Extensive Function Support• Index and Query Boosting

Thursday, May 26, 2011

Page 20: Jazeed about Solr - People as A Search Problem

...Oh My!

• Standard Plugins - Geospatial*, Faceting, Spelling, MoreLikeThis

• Full Text with Highlighted Results• Client agnostic

Thursday, May 26, 2011

Page 21: Jazeed about Solr - People as A Search Problem

Inevitable Question

• “Does it scale?”• Solr POC Benchmark

• 10 Million profiles• >200 queries/sec under 100ms 90th• Default tuning until 5 million profiles

Thursday, May 26, 2011

Page 22: Jazeed about Solr - People as A Search Problem

Profile Service

• RESTful Hybrid Data Service• Public, Private, Attributes• Event Producer

Thursday, May 26, 2011

Page 23: Jazeed about Solr - People as A Search Problem

Profiles

• Mostly structured• Categories - Eye Color, Desired

Ethnicity• Dates - Birthdate• Numbers - Coordinates, Age Range• Text -Name, Headline

Thursday, May 26, 2011

Page 24: Jazeed about Solr - People as A Search Problem

Inverting People

• Stored as an inverted index

• Index random accessed by term

Term DocumentMALE 1, 3, 5, 7, 9

FEMALE 2, 4, 6, 8, 10HAIR_RED 8

HAIR_BLOND 1, 2, 5, 6EYE_BLUE 1, 2, 3, 10

EYE_BROWN 4, 5, 6, 7, 8, 9fun 1, 3, 7, 9

funny 2, 4, 6, 10beach 1, 2, 3, 4, 5, 6, 7, 8

Thursday, May 26, 2011

Page 25: Jazeed about Solr - People as A Search Problem

Schema Design

• Single “Table”• One-to-many = multi-value fields• Individual vs Composite Fields

• copyTo and have both!

Thursday, May 26, 2011

Page 26: Jazeed about Solr - People as A Search Problem

Field considerations

• Stored or not• Indexed or not• Multivalued - desires fields• Type

Thursday, May 26, 2011

Page 27: Jazeed about Solr - People as A Search Problem

Solr Types Used

• tdate, tint, tfloat* - birthdate, loginAt• text - all text• string - id, non indexed text• random - good for random sorts• enum - for all enumerations

The ‘t’ is for Trie

Thursday, May 26, 2011

Page 28: Jazeed about Solr - People as A Search Problem

Data Duplication

• By function - numberPhotos & hasPhotos

• By relationship - hiddenBy & hidden• By analysis - name & text

Thursday, May 26, 2011

Page 29: Jazeed about Solr - People as A Search Problem

Saving Profiles

• Updating is in memory operation• No partial updates• Commit means flush index changes• Autocommit on maxDocs, maxTime or

both

Thursday, May 26, 2011

Page 30: Jazeed about Solr - People as A Search Problem

Why Also Voldemort

• Private profiles can not be stale• Many fields not searchable or viewable

by others• Isolate queries from fetch by id

Thursday, May 26, 2011

Page 31: Jazeed about Solr - People as A Search Problem

Querying

• Superset of Lucene• Efficient Range Queries• Multiple Query Handlers

• Dismax, Boost, Geo

Thursday, May 26, 2011

Page 32: Jazeed about Solr - People as A Search Problem

Recall vs Precision

• Focus on recall when corpus is small• Precision once it is at critical mass

Thursday, May 26, 2011

Page 33: Jazeed about Solr - People as A Search Problem

Boolean Queries

• Default operator set to AND• +gender:FEMALE +seeking:MALE

+eyeColor:EYE_BLUE +hairColor:(HAIR_RED, HAIR_BLONDE)

• Sort order is important

Thursday, May 26, 2011

Page 34: Jazeed about Solr - People as A Search Problem

Hybrid Queries

• Default operator set to OR• +gender:FEMALE +seeking:MALE

eyeColor:EYE_BLUE hairColor:(HAIR_RED, HAIR_BLONDE)

Thursday, May 26, 2011

Page 35: Jazeed about Solr - People as A Search Problem

Why you’re lucky if you like redheads

• Inverse Document Frequency (IDF)

• Rarer is favored over more common

• More fields matched = higher ranking

1.Blue eyed, redheads2.Blue eyed, blonds3.Redheads4.Blonds

Thursday, May 26, 2011

Page 36: Jazeed about Solr - People as A Search Problem

Boosting

• Query time by importance• eyeColor:EYE_BLUE^2

hairColor:HAIR_BLOND

Thursday, May 26, 2011

Page 37: Jazeed about Solr - People as A Search Problem

Filter Fields

• Useful for roles and other lists

• -hidden:(2 4 6)

id hidden

1 2, 4, 6

2 1

Thursday, May 26, 2011

Page 38: Jazeed about Solr - People as A Search Problem

Filter Fields

• Useful for roles and other lists

• -hidden:(2 4 6)• -hiddenBy:1

id hidden

1 2, 4, 6

2 1

id hiddenBy1 22 14 16 1

Thursday, May 26, 2011

Page 39: Jazeed about Solr - People as A Search Problem

Date Math

• Simplifies query preprocessing• +birthDate:[NOW/DAY+1DAY-36YEAR

TO NOW/DAY-25YEAR]

Thursday, May 26, 2011

Page 40: Jazeed about Solr - People as A Search Problem

Date Math

• Simplifies query preprocessing• +birthDate:[NOW/DAY+1DAY-36YEAR

TO NOW/DAY-25YEAR]

Between 25 and 35 years old

Thursday, May 26, 2011

Page 41: Jazeed about Solr - People as A Search Problem

Distance Searching

• lat, lon, distance• SolrLocal by Patrick O’Leary• Additional overhead ~90ms per query• Superceded in Solr 3.1

Thursday, May 26, 2011

Page 42: Jazeed about Solr - People as A Search Problem

Testing Queries

• Log queries and ids returned• Version your search strategies• Improve one thing at a time

Thursday, May 26, 2011

Page 43: Jazeed about Solr - People as A Search Problem

Geo Service

• Read-mostly service• Fields - Postal Code, Country,

State, Cities, Lat, Lon• Usage - Registration

Validation, City Selection

Thursday, May 26, 2011

Page 44: Jazeed about Solr - People as A Search Problem

Operations

• Servlet container and filesystem• Jetty 6, 64 Java 6 JVM• 8G Heap -XX:+UseCompressedOops

Thursday, May 26, 2011

Page 45: Jazeed about Solr - People as A Search Problem

Operations

• Active/Passive • Layer 7 Load balancing• Nightly snapshots• Eventually SolrCloud

Thursday, May 26, 2011

Page 46: Jazeed about Solr - People as A Search Problem

Multicore

• Run multiple schemas on the same• Hot swappable for backwards

compatible changes• private / public profiles

Thursday, May 26, 2011

Page 47: Jazeed about Solr - People as A Search Problem

Security

• No security provided• At minimum secure

your UpdateHandler• Separate Cores

<delete><query>*:*</query>

</delete>

Thursday, May 26, 2011

Page 48: Jazeed about Solr - People as A Search Problem

Future

• Solr 3.1• Mutual Matching• Faceting / Guided Search• Incorporating spelling• Hierarchies, categories, better ranking

models

Thursday, May 26, 2011

Page 49: Jazeed about Solr - People as A Search Problem

Faceting

• Returns counts with query results

• Efficient • Guides the user

toward precision

Thursday, May 26, 2011

Page 50: Jazeed about Solr - People as A Search Problem

Thank [email protected]

Twitter: @jtuberville

Thursday, May 26, 2011