advanced query parsing techniques

27
Advanced Query Parsing Techniques Aruna Kumar Pamulapati (Arun) Technical Consultant

Upload: search-technologies

Post on 18-Dec-2014

336 views

Category:

Technology


1 download

DESCRIPTION

This presentation given at the November 2013 Basis Technologies' Open Source Search Conference, reviews the role that advanced query parsing can play in building systems including: relevancy customization, taking input from user interface variables, such as the position on a website or geographical indicators, which sources are to be searched, and third party data sources. Query parsing can also enhance data security. Best practices for building and maintaining complex query parsing rules will be discussed and illustrated. http://www.searchtechnologies.com/query-parsing-language.html

TRANSCRIPT

Page 1: Advanced Query Parsing Techniques

Advanced Query Parsing Techniques

Aruna Kumar Pamulapati (Arun)Technical Consultant

Page 2: Advanced Query Parsing Techniques

2 The expert in the search space

Search Technologies Overview

Formed June 2005Over 100 employees and growingOver 500 customers worldwidePresence in US, Latin America, UK & GermanyDeep enterprise search expertiseConsistent revenue growth and profitabilitySearch Engine Independent

Page 3: Advanced Query Parsing Techniques

3 The expert in the search space

Lucene Relevancy: Simple Operators

term(A) TF(A) * IDF(A)Implemented with DefaultSimilarity / TermQueryTF(A) = sqrt(termInDocCount)IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0

and(A,B) A * BImplemented with BooleanQuery()

or(A, B) A + BImplemented with BooleanQuery()

max(A, B) max(A, B)Implemented with DisjunctionMaxQuery()

Page 4: Advanced Query Parsing Techniques

4 The expert in the search space

Simple Operators - Example

and

or max

george martha washington custis

0.10 0.20 0.60 0.90

0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90

0.3 * 0.9 = 0.27

Page 5: Advanced Query Parsing Techniques

5 The expert in the search space

Less Used Operators

boost(f, A) (A * f)Implemented with Query.setBoost(f)

constant(f, A) if(A) then f else 0.0Implemented with ConstantScoreQuery()

boostPlus(A, B) if(A) then (A + B) else 0.0Implemented with BooleanQuery()

boostMul(f, A, B) if(B) then (A * f) else AImplemented with BoostingQuery()

Page 6: Advanced Query Parsing Techniques

6 The expert in the search space

Problem: Need for More Flexibility

Difficult / impossible to use all operatorsMany not available in standard query parsers

Complex expressions = string manipulationThis is messy

Query construction is in the application layerYour UI programmer is creating query expressions?Seriously?

Hard to create and use new operatorsRequires modifying query parsers - yuck

Page 7: Advanced Query Parsing Techniques

7 The expert in the search space

Query Processing Language

Solr

UserInterface

QPLEngine Search

QPLScript

Page 8: Advanced Query Parsing Techniques

8 The expert in the search space

Introducing: QPL

Query Processing LanguageDomain Specific Language for Constructing QueriesBuilt on Groovyhttps://wiki.searchtechnologies.com/index.php/QPL_Home_Page

Solr Plug-InsQuery ParserSearch Component

“The 4GL for Text Search Query Expressions”Server-side Solr Access

Cores, Analyzers, Embedded Search, Results XML

Page 9: Advanced Query Parsing Techniques

9 The expert in the search space

Solr Plug-Ins

Page 10: Advanced Query Parsing Techniques

10 The expert in the search space

QPL Configuration – solrconfig.xml

<queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str></queryParser>

<searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str></searchComponent>

Query Parser Configuration:

Search Component Configuration:

Page 11: Advanced Query Parsing Techniques

11 The expert in the search space

QPL Example #1

myTerms = solr.tokenize(query);

phraseQ = phrase(myTerms);

andQ = and(myTerms);

return phraseQ^3.0 | andQ^2.0 | orQ;

Tokenize:

Phrase Query:

And Query:

Put It All Together:

orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms);

Or Query:

Page 12: Advanced Query Parsing Techniques

12 The expert in the search space

Thesaurus Example #2

myTerms = solr.tokenize(query);

thes = Thesaurus.load("thesaurus.xml")

thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms);

return and(thesQ);

Tokenize:

Load Thesaurus: (cached)

Thesaurus Expansion:

Put It All Together:Original Query: bathroom humor

[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]

Page 13: Advanced Query Parsing Techniques

13 The expert in the search space

More Operators

Boolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")

Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)

Composite Queries:compQ = and(compositeMax(

["title":1.5, "body":0.8],"george", "washington"))

Page 14: Advanced Query Parsing Techniques

14 The expert in the search space

News Feed Use Case

Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older

Page 15: Advanced Query Parsing Techniques

15 The expert in the search space

News Feed Use Case – Step 1

markets = split(solr.markets, "\\s*;\\s*")marketsQ = field("markets", or(markets));

terms = solr.tokenize(query);termsQ = field("body", or(thesaurus.expand(0.9f, terms)))

compIds = split(solr.compIds, "\\s*;\\s*")compIdsQ = field("companyIds", or(compIds))

Segments:

Terms:

Companies:

Page 16: Advanced Query Parsing Techniques

16 The expert in the search space

News Feed Use Case – Step 2

todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)

c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)

Today:

Yesterday:

sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()

Page 17: Advanced Query Parsing Techniques

17 The expert in the search space

News Feed Use Case – Step 3

sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)

tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)

recentQ = and(subjectQ, timeQ)

Weighted Subject Queries:

Weighted Time Queries:

Put it All Together:

return max(recentQ, or(marketsQ,compIdsQ)^0.01))

Page 18: Advanced Query Parsing Techniques

18 The expert in the search space

BT RLP Tokenizer Use Case – Step 1

<tokenizer class="com.basistech.rlp.solr.RLPTokenizerFactory" rlpContext=“<PATH>rlp-context-bl1.xml" postAltLemmas="false"

lang="eng" postPartOfSpeech="false"/>

Define field type:

finalExpandedQuery = transform(queryTerms,[ TERM:{ ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term)

if(btCustomTokens.size()> 1) return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1])); else

return ctx.op;} ]);

QPL Expansion:

Page 19: Advanced Query Parsing Techniques

19 The expert in the search space

BT RLP Tokenizer Use Case – Step 2

Original User Query: following is "presentation on QPL"

QPL Parsed: and(and(term(following),term(is)), phrase(term(presentation),term(on),term(QPL)))

BT Expansion + QPL Transformation :and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(be))),phrase(term(presentation),term(on),term(QPL)))

Page 20: Advanced Query Parsing Techniques

20 The expert in the search space

BT RLP Tokenizer Use Case – Step 3

and

and phrase

Presentation on QPLFollowing is

or

follow

or

be

^1.5 ^1.5

Page 21: Advanced Query Parsing Techniques

21 The expert in the search space

Embedded Search Example #1

results = solr.search('subjectsCore', or(qTerms), 50)

subjectsQ = or(results*.subjectId)

return field("title", and(qTerms)) | subjectsQ^0.9;

Execute an Embedded Search:

Create a query from the results:

Put it all together:

qTerms = solr.tokenize(qTerms);

Page 22: Advanced Query Parsing Techniques

22 The expert in the search space

Embedded Search Example #2

results = solr.search('categories', and(qTerms), 10)

myList = solr.newList();myList.add("relatedCategories", results*.title);

solr.addResponse(myList)

Execute an Embedded Search:

Create a Solr named list:

Add it to the XML response:

qTerms = solr.tokenize(qTerms);

Page 23: Advanced Query Parsing Techniques

23 The expert in the search space

Other Features

Embedded Grouping QueriesOh yes they did!

Proximity operatorsADJ, NEAR/#, BEFORE/#

Reverse LemmatizerPrefers exact matches over variants

TransformerApplies transformations recursively to query trees

Page 24: Advanced Query Parsing Techniques

24 The expert in the search space

Query Processing Language

Solr

UserInterface

QPLEngine Search

Data as entered by user Boolean

Query ExpressionQPL

Script

ApplicationDev Team

Search Team

Page 25: Advanced Query Parsing Techniques

25 The expert in the search space

Query Processing Language

Solr

UserInterface

QPLEngine Search

QPLScript

RDBMS OtherIndexes Thesaurus

Page 26: Advanced Query Parsing Techniques

26 The expert in the search space

More on QPL…

http://www.searchtechnologies.com/query-

parsing-language.html