zschiedrich olaf - solr @ ebay kleinanzeigen - read only

17
Solr @ eBay Kleinanzeigen Olaf Zschiedrich, eBay Classifieds Group [email protected], 5/25/2011

Upload: lucid-imagination

Post on 29-Jan-2016

218 views

Category:

Documents


0 download

DESCRIPTION

Solr @ eBay Kleinanzeigen Olaf Zschiedrich, eBay Classifieds Group [email protected], 5/25/2011 ! Olaf Zschiedrich ! eBay Classifieds Group ! Head of Technology @ eBay Kleinanzeigen ! Area of expertise/interest: • High traffic web-applications • Agile development • Java/JEE • Search technologies 3 Agenda 4 About eBay Classifieds Group 5 About eBay Classifieds Group 6

TRANSCRIPT

Page 1: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Solr @ eBay Kleinanzeigen

Olaf Zschiedrich, eBay Classifieds Group [email protected], 5/25/2011

Page 2: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Who I am? !  Olaf Zschiedrich !  eBay Classifieds Group !  Head of Technology @ eBay Kleinanzeigen !  Area of expertise/interest:

•  High traffic web-applications •  Agile development •  Java/JEE •  Search technologies

3

Page 3: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Agenda !  About eBay Classifieds Group/ebay Kleinanzeigen !  Metrics & Traffic Numbers !  Why Solr? !  Solr Features in Action !  Data Indexing !  Solr in Production !  Best Practices !  Problems !  Outlook !  Questions

4

Page 4: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

About eBay Classifieds Group

5

Page 5: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

About eBay Classifieds Group

online classifieds company in the world

6

Page 6: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

About eBay Kleinanzeigen !  Typilcal classifieds ad platform (horizontal, local trading)

!  Launched 2009 after 4 months of development !  Small agile team (using Scrum)

•  12-15 people total •  5-7 developers

!  Leverages open source (Spring, Solr, MySQL, ActiveMQ)

!  Applications: •  Public website •  Customer support tool •  API (Rest supporting JSON and XML) •  Iphone App (~ 250.000 installations) •  Facebook App

7

Page 7: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Metrics & Traffic Numbers !  Site metrics:

•  ~ 3.2 M active ads •  16 – 24 M PVs per day •  Peak hours = 1.8 M PVs (~ 500 PVs per second)

!  Solr request metrics: •  ~ 60 M requests per day •  Peak hours = ~ 1500 request per second

!  Avg. response time •  20 ms (search) and 3 ms for auto-suggest

Site is rapidly growing !!!

8

Page 8: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Why Solr !  Open Source !  Good documentation / big community !  Java-based (the language we know/use)

!  Widely used (especially lucene)

!  Based on lucene (de-facto standard for full text search in java)

!  Feature-rich (including enterprise features)

!  Extensible (e.g. easy implementation of own tokenizers)

!  Easy to integrate (HTTP, SolrJ client)

!  Easy to setup (java web application)

Most promising option we looked at. Due to very aggressive timelines no time consuming research was possible!

9

Page 9: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Solr Features in Action !  Faceting !  Language specific stemming !  More Like This !  Auto-Suggest based on TermComponent !  Spellchecking !  Synonyms !  Stopwords !  Dynamic fields

10

Page 10: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Data Indexing !  Use of Delta Import Handler !  Delta import runs every 10 minutes !  Full import only done in case schema

change requires full index rebuild !  Index optimized once a day

11

MySQL Slave

Solr Master

Solr Slave

JDBC

Delta Import Handler

Solr Slave Solr Slave

HTTP / REST API Replication Handler

Page 11: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Solr In Production !  2 datacenters !  1 Master + 6 Slaves per datacenter

Slaves show very low resource consumption. Could go down to 4 slaves per datacenter while still having 50% overcapacity

!  Master only used for indexing !  Load balancer in front of slaves !  Varnish in front of slaves (for dedicated use cases)

!  Working closely with SITE-OPS Team !  DEV-OPS are part of development process

12

Page 12: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Solr 3.1 in Production !  Solr 3.1 productive since mid of May !  Not plug and play. Needs migration path as:

•  Index format has changed •  Java-bin format has changed

!  Two major problems: •  Bug in spellchecker (SOLR-2462)

Leads to infinite GC loops

•  Bug in replication handler (SOLR-2469) Leads to growing disk usage as old index files are not removed is case “replicateAfter=startup” is used.

13

Page 13: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Best Practises !  Use solr cores right from the beginning

Allows you to run mutiple indexes on one box in dev and distribute indexes to mutiple boxes in production

!  Use filter queries !  Use caching (FieldCache, QueryCache, Web Proxy Cache e.g. Varnish or Squid)

!  Tune JVM properly !  Build search-layer hiding the usage of solr

SearchCommand cmd = new SearchCommand(); cmd.setKeywords(“BMW 323“); ... SearchResult result = searchService.searchActiveAds(cmd); List<Ad> ads = result.getAds();"

!  Create a QueryBuilder to ease query building SolrQueryBuilder sqb = new SolrQueryBuilder(); sqb = sqb.freetext("freetext", "BMW").and().in("color", "RED", "BLACK“); sqb = sqb.and().not().eq("fuel_type", "GAS").and().lt(“price“, "10000"); ... String query = sqb.build(); (Just an example. Normally filter queries should be used for a query like this!)

14

Page 14: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Problems !  Distance search including sorting

•  Not supported in previous Solr versions •  LocalSolr

not working with Solr 1.4 final, GC issues, performance issues •  Solution:

Got rid of sort by distance. Implemented own distance search based on bounding boxes and simple range queries.

•  Solved in 3.1

!  Real time updates !  Deep paging large result sets (SOLR-1726)

15

Page 15: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Outlook / Future Plans !  Migrate further applications to Solr

Most batch-jobs and customer support tool search against db which is getting slower due to growth of data.

!  Evaluate new features of Solr 3.1 •  Spatial/distance search •  New auto-suggest component •  Extended dismax query parser

16

Page 16: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Questions ?

17

Page 17: Zschiedrich Olaf - Solr @ Ebay Kleinanzeigen - read only

Contact !  Olaf Zschiedrich

•  [email protected] •  [email protected] •  www.ebay-kleinanzeigen.de

18