thinking beyond search with solr - magento · 2013. 7. 31. · • separate application –...

36
Magento Expert Consulting Group Webinar | July 31, 2013 Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Magento Expert Consulting Group Webinar | July 31, 2013

Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale

Page 2: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Udi Shamay Head, Expert Consulting Group [email protected]

Steve Kukla Business Solution Architect, Expert Consulting Group [email protected]

Kirill Morozov Application Architect, Expert Consulting Group [email protected]

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 2

The presenters Magento Expert Consulting Group

Page 3: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

What is Apache Solr?

Business Use Cases for Scale Supporting Initial Catalog Growth Supporting Growing Traffic Supporting Substantial Catalog Growth Supporting A Real-Time Catalog

Key Points to Remember

Q&A

Today’s agenda

July 31, 2013 | 3 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 4: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

What is Apache Solr?

July 31, 2013 | 4 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 5: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Solr • Separate application – installed on its own server, or

on an existing server in the environment depending on

business needs.

• Solr uses schema configuration files which can be

found in Magentto/lib/Apache

• Magento communicates with Solr via HTTP/XML

• Searching options configured via the Magento admin

panel

July 31, 2013 | 5 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

What is Apache Solr? General Solr Overview

Page 6: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Better text-based searching provides a better customer experience • More relevant “fuzzy” searching*

• Faceted searches

• Search corrections

• Out of the box type-ahead*

• Response caching for better performance

July 31, 2013 | 6

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

*Requires customization to leverage at 100%

What is Apache Solr? Solr the Search Platform

Page 7: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Solr is more than a search engine because… • Most data customers see is handled by

Solr instead of MySQL

July 31, 2013 | 7 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

What is Apache Solr? What Makes Solr Powerful

Page 8: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Solr is more than a search engine because… • Most data customers see is handled by

Solr instead of MySQL

• Solr uses a simpler data structure

July 31, 2013 | 8 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

What is Apache Solr? What Makes Solr Powerful

product_id attribute_id product_id attribute_name

attribute_id product_id attribute_value

product_id attribute_name attribute_value

MySQL (EAV)

Solr (No EAV)

Page 9: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Solr is more than a search engine because… • Most data customers see is handled by

Solr instead of MySQL

• Solr uses a simpler data structure

• Solr supports replication which allows it to

truly scale for growth

July 31, 2013 | 9 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

What is Apache Solr? What Makes Solr Powerful

Solr Solr

Solr Solr

Solr

Magento

Page 10: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting Initial Catalog Growth

July 31, 2013 | 10 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 11: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Business Background • Growing catalog – from 10K to 100K SKUs

• From 1 to 2 stores

• From 1 to 2+ web nodes / 1 database node

• Using native Solr Search

July 31, 2013 | 11 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Problems • Increased indexing time

• Out-dated information on the front-end

Business Use Case Supporting Initial Catalog Growth

Page 12: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting Initial Catalog Growth Problem – Increasing Index Footprint

*Expected indexing time

July 31, 2013 | 12

35 Min* 17.5

min* 3.5 min*

Year 2 2 websites 2 store views

17.5 min*

10 Min*

1.75 Min*

Control 1 website 1 store view

10,000 SKUs

50,000 SKUs

100,000 SKUs

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Slow Indexing

Page 13: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

July 31, 2013 | 13

Concept

• Connects to the database using JDBC

• Extra data transformations must be

written in Java/JavaScript.

• Uses a prepared xml configuration

Supporting Initial Catalog Growth Solution – Custom Data Import Handler

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 14: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Results • 10 times faster indexing

• Supports delta-indexing

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 14

Supporting Initial Catalog Growth Data Import Handler – Results

Things to keep in mind • Solr knows about its data source

• May require extra development efforts

• Extra data transformations must be

written in Java/JavaScript

Page 15: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting Growing Traffic

July 31, 2013 | 15 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 16: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Business Background • Growing catalog – 1,000,000 SKUs

• Growing traffic: up to 100 requests / second

• 3 stores

• 3+ web nodes/ 1 database node

• Using Data Import Handler

July 31, 2013 | 16 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Problem • Solr can’t handle increasing user

concurrency

Business Use Case Supporting Growing Traffic

Page 17: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

47.5 Min* 23.75

min*

35 min*

17.5 Min*

3.5 Min*

Control 2 website 2 store view

500,000 SKUs

1,000,000 SKUs

*Expected indexing time

July 31, 2013 | 17

4.75 min*

Year 3 3 websites 3 store views

100,000 SKUs

< 1000 updates/sec

Indexing delta data handles

updates

Supporting Growing Traffic Increasing Index Footprint – OK

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 18: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

120 msec* 100

msec* 80 msec*

Year 3 3 websites 3 store views

105 msec*

95 msec*

75 msec*

Control 2 website 2 store view

100,000 SKUs 30 RPS

500,000 SKUs 60 RPS

1,000,000 SKUs 100 RPS

*Expected average response time

July 31, 2013 | 18

Solr CPU is maxed

out

Supporting Growing Traffic Problem – Increased Response Time

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 19: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

July 31, 2013 | 19

Supporting Growing Traffic Solution – Solr Replication

Concept • Separate reading requests

• Replicate index across multiple nodes

• Read from multiple servers

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 20: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Results • Allows Solr to handle read traffic

• Introduces fail-over

Things to keep in mind • Requires middle-ware or Magento customization

• Possible heavy data duplication

• Extra changes in infrastructure

July 31, 2013 | 20

Supporting Initial Catalog Growth Solr Replication – Results

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 21: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting Substantial Catalog Growth

July 31, 2013 | 21 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 22: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Business Background • Growing catalog – 5,000,000 SKUs

• 4 stores

• 4+ web nodes / 1 database node

• Using Data Import Handler

• Using Solr replication

July 31, 2013 | 22 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Problems • Delta-indexing delays

• Slow response time

Business Use Case Supporting Substantial Catalog Growth

Page 23: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

317.5 Min* 158.75

min*

237.5 min*

118.75 Min*

47.5 Min*

Control 3 website 3 store view

2,500,000 SKUs

5,000,000 SKUs

*Expected indexing time

July 31, 2013 | 23

63.5 min*

Year 4 4 websites 4 store views

1,000,000 SKUs

> 1000 updates/sec

Supporting Substantial Catalog Growth Problem – Increasing Index Footprint

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Delta indexing delays

Page 24: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

400 msec* 270

msec* 150 msec*

Year 4 4 websites 4 store views

300 msec*

230 msec*

120 msec*

Control 3 website 3 store view

1,000,000 SKUs 100 RPS

2,500,000 SKUs 200 RPS

5,000,000 SKUs 400 RPS

*Expected average response time

July 31, 2013 | 24

Slow response

time

Supporting Substantial Catalog Growth Problem – Increased Response Time

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 25: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

July 31, 2013 | 25

Concept

• Distributed search

• Distributed + Replication

(SolrCloud)

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Supporting Substantial Catalog Growth Solution – Index Sharding

Page 26: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Results • Distributed search for faster response time

• 50 times faster indexing with 5 shards

Supporting Growing Traffic Index Sharding – Results

July 31, 2013 | 26

MySQL A B C

I D H

F G E

Magento

D E F

G H I Solr Shards

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Things to keep in mind… • Custom solution

• Requires Magento customization or

middleware introduction

• Extra changes in infrastructure

Page 27: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting A Real-Time Catalog

July 31, 2013 | 27 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 28: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Business Background • Growing catalog – 10,000,000 SKUs

• 5 stores

• 5+ web nodes / 1 database node

• Data Import Handler

• SolrCloud and distributed search

July 31, 2013 | 28 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Business Requirement • Always up-to-date index

Business Use Case Supporting A Real-Time Catalog

Page 29: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting A Real-Time Catalog Solution – Listen To The MySQL Bin Log

July 31, 2013 | 29

Concept • Connect via MySql replication protocol

• Listen to data-related events

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

MySQL

MySql Slave

Replication Binlog

Page 30: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Supporting A Real-Time Catalog Solution – Listen To The MySQL Bin Log

July 31, 2013 | 30

Concept • Connect via MySql replication protocol

• Listen to data-related events

• Extract information from events

• Manipulate with document in Lucene index

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

MySQL

Solr

Log

Parser

Replication Listener Binlog

Page 31: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Results • Replication-like connection • Indexes are always up-to-date Things to keep in mind • Relatively complex implementation

July 31, 2013 | 31

Magento

MySQL

A

Solr Shards

B C

I D H

F G E

D E F

G H I

Bin log

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Supporting A Real-Time Catalog Listening To The MySQL Bin Log – Results

Page 32: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Key Points to Remember

July 31, 2013 | 32 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 33: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

• Solr’s search capabilities provide a better site experience than MySQL LIKE or Full-text

• Solr is more than a search platform – it is a key for scalability and growth

• Solr’s data import handler keeps Solr performing well as your catalog grows

• Solr replication helps accommodate growing traffic

• Solr shards help keep indexing execution time and search response times low for very

large catalogs

• Listening to the MySQL bin log can help facilitate a continuously updating catalog

July 31, 2013 | 33 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Key Points to Remember Solr helps businesses scale

Page 34: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Scaling Solr Solr Wiki http://wiki.apache.org/solr/ Type-Ahead http://wiki.apache.org/solr/Suggester Data Import Handler(DIH) http://wiki.apache.org/solr/DataImportHandler Replication http://wiki.apache.org/solr/SolrReplication Shard http://wiki.apache.org/solr/SolrCloud Distributed Search http://wiki.apache.org/solr/DistributedSearch MySql Replication listening Change Data Capture http://www.slideshare.net/mkindahl/binary-log-api-presentation-oscon-2011 Replication Listener (C) https://launchpad.net/mysql-replication-listener Open-Replicator (Java) http://code.google.com/p/open-replicator/

References

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 34

Page 35: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Q&A

July 31, 2013 | 35 Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Page 36: Thinking Beyond Search with Solr - Magento · 2013. 7. 31. · • Separate application – installed on its own server, or on an existing server in the environment depending on business

Udi Shamay Head, Expert Consulting Group [email protected]

Steve Kukla Business Solution Architect, Expert Consulting Group [email protected]

Kirill Morozov Application Architect, Expert Consulting Group [email protected]

July 31, 2013 | 36

The presenters Magento Expert Consulting Group

Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale