Transcript
Page 1: Understanding and visualizing solr explain information - Rafal Kuc

Understanding and visualisingSolr explain information

Rafał Kuć, Marek Rogoziński, [email protected], [email protected], 18.10.2011

Page 2: Understanding and visualizing solr explain information - Rafal Kuc

My Background

� Rafał Kuć• Working with Lucene since 2002• Working with Solr since 2007

� Solr.pl• Co – founder (with Marek Rogozi ńńńński)

� Area of expertise• Lucene and Solr consultant and architect in

many major e-commerce sites in Poland• Author of „Solr 3.1 cookbook” by Packt

Publishing• Father, husband, Starcraft II player and a

gardener after hours ☺

3

Page 3: Understanding and visualizing solr explain information - Rafal Kuc

What I Will Cover

� Understanding and visualising Solr explaininformation

� How to make the information given by Apache Solr explain easily readable by a Solr user (not much technical one)

� Context• Complicated explain made simple• Explain other made even simpler

� What’s next to come

4

Page 4: Understanding and visualizing solr explain information - Rafal Kuc

A typical use case

Page 5: Understanding and visualizing solr explain information - Rafal Kuc

The Challenge

� Common questions like:• Why this document was found ?• Why this document wasn’t found ?• Why this document is higher than the other one ?• Why the results list look like this ?

� Considerations• Do we always have to anwser those questions ?

� So how to make users get the answers they want ?• That’s how http://explain.solr.pl was born

6

Page 6: Understanding and visualizing solr explain information - Rafal Kuc

Let’s look at a typical example

� You run a query• q=ddr&defType=dismax&qf=name^1000+description^100&bf

=pow(price,1.5)&debugQuery=true&indent=true

� And you see the explain information

7

1.6771803 = (MATCH) sum of: 0.64883727 = (MATCH) max of:

0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of:0.99999994 = queryWeight(name:ddr^1000.0), product of:

1000.0 = boost2.446919 = idf(docFreq=3, maxDocs=17) 4.0867718E-4 = queryNorm

0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of: 1.4142135 = tf(termFreq(name:ddr)=2) 2.446919 = idf(docFreq=3, maxDocs=17) 0.1875 = fieldNorm(field=name, doc=6)

1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of: 2516.272 = pow(float(price)=185.0,const(1.5)) 1.0 = boost4.0867718E-4 = queryNorm

Page 7: Understanding and visualizing solr explain information - Rafal Kuc

Some theory

� tf – term’s frequency

� df – document frequency� idf – inverse document frequency

� norm – normalization factor• queryNorm – query normalization factor• fieldNorm – field normalization factor

� coord – score factor

8

Page 8: Understanding and visualizing solr explain information - Rafal Kuc

Let’s take a look at it again1.6771803 = (MATCH) sum of:

0.64883727 = (MATCH) max of:

0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of:

0.99999994 = queryWeight(name:ddr^1000.0), product of:

1000.0 = boost

2.446919 = idf(docFreq=3, maxDocs=17)

4.0867718E-4 = queryNorm

0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of:

1.4142135 = tf(termFreq(name:ddr)=2)

2.446919 = idf(docFreq=3, maxDocs=17)

0.1875 = fieldNorm(field=name, doc=6)

1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of:

2516.272 = pow(float(price)=185.0,const(1.5))

1.0 = boost

4.0867718E-4 = queryNorm

Page 9: Understanding and visualizing solr explain information - Rafal Kuc

A little more complicated example36.50278 = (MATCH) sum of:

1.54896 = (MATCH) sum of:0.46676102 = (MATCH) max of:0.46676102 = (MATCH) weight(name:hard^20.0 in 2), product of:

0.5461986 = queryWeight(name:hard^20.0), product of:20.0 = boost2.734601 = idf(docFreq=2, maxDocs=17)0.009986806 = queryNorm

0.8545628 = (MATCH) fieldWeight(name:hard in 2), product of:1.0 = tf(termFreq(name:hard)=1)2.734601 = idf(docFreq=2, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)

0.46676102 = (MATCH) max of:0.46676102 = (MATCH) weight(name:drive^20.0 in 2), product of:

0.5461986 = queryWeight(name:drive^20.0), product of:20.0 = boost2.734601 = idf(docFreq=2, maxDocs=17)0.009986806 = queryNorm

0.8545628 = (MATCH) fieldWeight(name:drive in 2), product of:1.0 = tf(termFreq(name:drive)=1)2.734601 = idf(docFreq=2, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)

0.61543787 = (MATCH) max of:

0.098470055 = (MATCH) weight(manu:maxtor in 2), product of:0.03135923 = queryWeight(manu:maxtor), product of:3.1400661 = idf(docFreq=1, maxDocs=17)0.009986806 = queryNorm

3.1400661 = (MATCH) fieldWeight(manu:maxtor in 2), product of:1.0 = tf(termFreq(manu:maxtor)=1)3.1400661 = idf(docFreq=1, maxDocs=17)1.0 = fieldNorm(field=manu, doc=2)

0.61543787 = (MATCH) weight(name:maxtor^20.0 in 2), product of:0.6271846 = queryWeight(name:maxtor^20.0), product of:20.0 = boost3.1400661 = idf(docFreq=1, maxDocs=17)0.009986806 = queryNorm

0.9812707 = (MATCH) fieldWeight(name:maxtor in 2), product of:1.0 = tf(termFreq(name:maxtor)=1)3.1400661 = idf(docFreq=1, maxDocs=17)0.3125 = fieldNorm(field=name, doc=2)

34.95382 = (MATCH) FunctionQuery(float(price)), product of:350.0 = float(price)=350.010.0 = boost0.009986806 = queryNorm

Page 10: Understanding and visualizing solr explain information - Rafal Kuc

And now , a real life example1.6287426 = (MATCH) sum of:

0.8143703 = (MATCH) sum of:0.40718514 = (MATCH) max plus 0.01 times others of:4.154771E-7 = (MATCH) weight(description_nostemm:harry^10.0 in 36647), product of:4.4066886E-7 = queryWeight(description_nostemm:harry^10.0), product of:10.0 = boost7.5426636 = idf(docFreq=796, maxDocs=553224)5.8423506E-9 = queryNorm

0.94283295 = (MATCH) fieldWeight(description_nostemm:harry in 36647), product of:1.0 = tf(termFreq(description_nostemm:harry)=1)7.5426636 = idf(docFreq=796, maxDocs=553224)0.125 = fieldNorm(field=description_nostemm, doc=36647)

0.40718514 = (MATCH) weight(category_search:harri^2000000.0 in 36647), product of:0.123389944 = queryWeight(category_search:harri^2000000.0), product of:2000000.0 = boost10.559957 = idf(docFreq=38, maxDocs=553224)5.8423506E-9 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_search:harri in 36647), product of:1.0 = tf(termFreq(category_search:harri)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_search, doc=36647)

5.976383E-8 = (MATCH) weight(description:harri in 36647), product of:4.2931266E-8 = queryWeight(description:harri), product of:7.348286 = idf(docFreq=967, maxDocs=553224)5.8423506E-9 = queryNorm

1.3920817 = (MATCH) fieldWeight(description:harri in 36647), product of:1.7320508 = tf(termFreq(description:harri)=3)7.348286 = idf(docFreq=967, maxDocs=553224)0.109375 = fieldNorm(field=description, doc=36647)

0.40718514 = (MATCH) max plus 0.01 times others of:5.0300997E-7 = (MATCH) weight(description_nostemm:potter^10.0 in 36647), product of:4.84872E-7 = queryWeight(description_nostemm:potter^10.0), product of:10.0 = boost8.299262 = idf(docFreq=373, maxDocs=553224)5.8423506E-9 = queryNorm

1.0374078 = (MATCH) fieldWeight(description_nostemm:potter in 36647), product of:1.0 = tf(termFreq(description_nostemm:potter)=1)8.299262 = idf(docFreq=373, maxDocs=553224)0.125 = fieldNorm(field=description_nostemm, doc=36647)

0.40718514 = (MATCH) weight(category_search:Potter^2000000.0 in 36647), product of:0.123389944 = queryWeight(category_search:Potter^2000000.0), product of:2000000.0 = boost10.559957 = idf(docFreq=38, maxDocs=553224)5.8423506E-9 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_search:Potter in 36647), product of:1.0 = tf(termFreq(category_search:Potter)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_search, doc=36647)

5.7398886E-8 = (MATCH) weight(description:Potter in 36647), product of:4.656172E-8 = queryWeight(description:Potter), product of:7.9696894 = idf(docFreq=519, maxDocs=553224)5.8423506E-9 = queryNorm

1.2327484 = (MATCH) fieldWeight(description:Potter in 36647), product of:1.4142135 = tf(termFreq(description:Potter)=2)7.9696894 = idf(docFreq=519, maxDocs=553224)0.109375 = fieldNorm(field=description, doc=36647)

1.8327936E-6 = (MATCH) max plus 0.01 times others of:1.8327936E-6 = (MATCH) weight(description_nostemm:"harry potter"~100^10.0 in 36647), product of:9.255408E-7 = queryWeight(description_nostemm:"harry potter"~100^10.0), product of:10.0 = boost15.841926 = idf(description_nostemm: harry=796 potter=373)5.8423506E-9 = queryNorm

1.9802407 = fieldWeight(description_nostemm:"harry potter" in 36647), product of:1.0 = tf(phraseFreq=1.0)15.841926 = idf(description_nostemm: harry=796 potter=373)0.125 = fieldNorm(field=description_nostemm, doc=36647)

0.81437016 = (MATCH) sum of:0.40718508 = (MATCH) weight(category_the:harri in 36647), product of:0.12338993 = queryWeight(category_the:harri), product of:10.559957 = idf(docFreq=38, maxDocs=553224)0.011684701 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_the:harri in 36647), product of:1.0 = tf(termFreq(category_the:harri)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_the, doc=36647)

0.40718508 = (MATCH) weight(category_the:Potter in 36647), product of:0.12338993 = queryWeight(category_the:Potter), product of:10.559957 = idf(docFreq=38, maxDocs=553224)0.011684701 = queryNorm

3.2999864 = (MATCH) fieldWeight(category_the:Potter in 36647), product of:1.0 = tf(termFreq(category_the:Potter)=1)10.559957 = idf(docFreq=38, maxDocs=553224)0.3125 = fieldNorm(field=category_the, doc=36647)

3.394099E-7 = (MATCH) FunctionQuery(pow(int(sold),const(1.5))), product of:58.09475 = pow(int(sold)=15,const(1.5))1.0 = boost5.8423506E-9 = queryNorm

Page 11: Understanding and visualizing solr explain information - Rafal Kuc

Let’s visualize now

Page 12: Understanding and visualizing solr explain information - Rafal Kuc

History view

Page 13: Understanding and visualizing solr explain information - Rafal Kuc

Basic information

Page 14: Understanding and visualizing solr explain information - Rafal Kuc

The real thing

Page 15: Understanding and visualizing solr explain information - Rafal Kuc

Even more ☺

Page 16: Understanding and visualizing solr explain information - Rafal Kuc

What if we can ’t match ?

Page 17: Understanding and visualizing solr explain information - Rafal Kuc

And the no-matched explain

Page 18: Understanding and visualizing solr explain information - Rafal Kuc

What you gain from explain.solr.pl

� View Solr explain information in a humanreadable form

� Easily recognize the most influencing elementsof the scoring process

� Answer the questions faster� More things to come in the future

19

Page 19: Understanding and visualizing solr explain information - Rafal Kuc

Plans for the future

� Support for more formats of Apache Solrexplain (right now, only Solr 3.x is supported)

� Visualisation of additional data� More functionalities like:

• query problems analysis• query syntax analysis and explanation• query time analysis and visualization• result comparison between cores or instances

� Very distant future - additional web applicationdeployed along Solr to enable real timeanalysis of boosts influence

Page 20: Understanding and visualizing solr explain information - Rafal Kuc

Wrap Up

� The http://explain.solr.pl should be availablevery soon (probably end of October or midNovember)

� Code of explain.solr.pl will be available on GitHub soon after the initial release

� There will be a Java version of thehttp://explain.solr.pl which will cover much moreinformation

21

Page 21: Understanding and visualizing solr explain information - Rafal Kuc

Sources

� Links• http://www.solr.pl• http://explain.solr.pl• http://lucene.apache.org ☺

� We would like to thank:• ŁŁŁŁukasz Lewandowski ( http://llewandowski.pl/ ) for

his work on the GUI • Hubert ‘depesz’ Lubaczewski ( http://depesz.com )

for idea ☺

22

Page 22: Understanding and visualizing solr explain information - Rafal Kuc

Contact

� Rafał Kuć• [email protected]• http://solr.pl

� Marek Rogoziński• [email protected]• http://solr.pl

23

Page 23: Understanding and visualizing solr explain information - Rafal Kuc

Thank you


Top Related