solr the intelligent search engine
DESCRIPTION
Searching for products is a key operation for eCommerce sites, where both speed and flexibility are needed. Experience how Solr’s error tolerant Search helps the customers of House of Sound to find their products.TRANSCRIPT
SOLR, THE INTELLIGENT SEARCH ENGINE Benoît Largeau
AGENDA:
Stakes | Introduction | Indexing | Scalability | Searching | Admin tools | Conclusion
WHAT ARE THE STAKES?
Considering:
- One user on two is a searcher one on two will use the internal search engine
- This searcher population transform more often than other visitors
- Less patient to browse need to find quickly otherwise they leave to another shop
INTERNAL SEARCH ENGINE IS ESSENTIAL.
SEARCH FIND ADD TO CART PAY
• Open source enterprise search server Initiated by CNET in 2004
Openly published the source code in 2006
• the underlying engine
• Independent server using standards to communicate such as HTTP / XML / JSON
usable on every web project such as those based on Magento
SOLR PROJECT.
INTRODUCTION TO SOLR.
INTRODUCTION TO SOLR.
SOME REFERENCES.
More references here: http://wiki.apache.org/solr/PublicServers
Indexing data - Index the whole site (including files, …)
- Tolerance (stemmings, synonyms, …)
Searching data - Layered navigation
- Customizable relevance calculation
- Predictive search (different kinds)
- Stemming, Plurals, Synonyms,
Stop words, …
FEATURES OFFERED BY SOLR.
INTRODUCTION TO SOLR.
Admin tools Display more statistics
(most frequent requests
or search with no answer)
Scalability
FEATURES OFFERED BY SOLR.
INDEXING DATA.
Indexing data - Index the whole site (including files, …)
- Tolerance (stemmings, synonyms, …)
Schema
Define how to handle structured data
sent by Magento (no crawler such as Nutch)
Typing data
price & weight are floats, product name is a string, …
o Structured data in Solr allows faceted search
to filter by price range for example
Determined by the intended search behavior
if we need to filter per price range
-> prices have to be stored as floats and not strings to stay comparable
Text analysis
Text splitted in terms which are processed to calculate stemming, define synonyms, …
SCHEMA & TEXT ANALYSIS.
INDEXING DATA.
INDEXING DATA.
Generally indexing structured data e.g. products
Able to index binary formats
such as PDF, MS Office, images or music files
Using an interface Solr Cell
which is an adapter to Apache Tika
Apache Tika is a toolkit to detect and
extract metadata and text content from various documents
INDEXING FILES.
INDEXING DATA.
FEATURES OFFERED BY SOLR.
Scalability
SCALABILITY.
Suitably efficient and practical
when applied to large situations
With a bigger data index or more visitors
searches are slower!
Testing Solr performance with SolrMeter
Solutions to keep good performances with more data:
1. Scale up: Optimizing a single Solr server
2. Scale horizontally: Moving to multiple Solr Servers with replications
3. Scale deep: Combining replication and sharding (for distributed search)
DURABLE SOLUTION.
SCALABILITY.
FEATURES OFFERED BY SOLR.
SEARCHING DATA.
Searching data - Layered navigation
- Customizable relevance calculation
- Predictive search (different kinds)
- Stemming, Plurals, Synonyms,
Stop words, …
SEARCHING DATA.
Factors influencing score:
1. Term frequency
2. Inverse document frequency
the rarer a term is in the whole index, the higher its score is.
3. Co-ordination factor
the greater the number of query clauses that match a document.
4. Field length
the shorter the matching field is, the greater the matching document‘s score is.
5. Boosting customized mathematical rules to increase score.
In Magento, based on attribute weights
E.g. name 5 -> manufacturer 4 -> sku 3 -> price 2 -> meta_keywords 1
SEARCH RELEVANCY.
SEARCHING DATA.
FEATURES OFFERED BY SOLR.
ADMIN TOOLS.
Admin tools Display more statistics
(most frequent requests
or search with no answer)
ADMIN TOOLS.
1) Available admin tool in solr but oriented developper
To check schema, index, general config, Solr server availability, to view
technical statistics…
2) Prefer to use Magento backend
To check frequent request or no answer request
Very helpful to analyse user expectations then to improve the catalog
ADMIN FEATURES.
Steps:
1. Install and configure Solr
single or multiple servers
single or multiple languages, …
2. Adapt the standard Magento product schema
to your project context
3. Define additional customized data to index
such as other tables, files, …
4. Influence search relevance
defining attribute weights
5. Integrate in Magento frontend
CONCLUSION.
INTEGRATE SOLR IN YOUR PROJECT.
CONCLUSION.
COMPARISONS.
Features Magento
Basic SE
Magento
with Solr
Product indexing ▲ ▲
Document indexing ▲
Synonyms ▲ ▲
Stemming ▲
Stop words ▲
Faceted search ▲ ▲
Relevance calculation ▲ ▲
Customizable relevance calculation ▲
Scalability ▲
Predictive search ▲
Admin tools (frequent requests, no answer…) ▲ ▲
No extra time needed to integrate ▲
SOLR
clearly improves
User experience
which increases your
Transformation Rate
CONCLUSION.
Remember: 1 user on 2 is a searcher!
CS2 AG
PLATINUM MEMBER TYPO3 ASSOCIATION
MAGENTO GOLD PARTNER
SUGAR SILVER PARTNER
CUSTOMER RELATIONSHIP MANAGEMENT
ELECTRONIC COMMERCE
ONLINE MARKETING
Gerbegässlein 1 | CH-4450 Sissach
Feldeggstrasse 55 | CH-8008 Zürich
Telefon: +41 61 333 22 22
Twitter: @CS2switzerland
www.CS2.ch