2.droppdf.com2.droppdf.com/files/qrtgk/apache-solr-essentials.pdf · table of contents apache solr...

338
www.it-ebooks.info

Upload: others

Post on 15-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 2: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 3: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ApacheSolrEssentials

www.it-ebooks.info

Page 4: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TableofContents

ApacheSolrEssentials

Credits

AbouttheAuthor

Acknowledgments

AbouttheReviewers

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmore

Whysubscribe?

FreeaccessforPacktaccountholders

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Errata

Piracy

Questions

1.GetMeUpandRunning

InstallingastandaloneSolrinstance

Prerequisites

Downloadingtherightversion

Settingupandrunningtheserver

SettingupaSolrdevelopmentenvironment

Prerequisites

Importingthesampleprojectofthischapter

Understandingtheprojectstructure

www.it-ebooks.info

Page 5: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DifferentwaystorunSolr

Backgroundserver

Integrationtestserver

Whatdowehaveinstalled?

Solrhome

solr.xml

schema.xml

solrconfig.xml

Otherresources

Troubleshooting

UnsupportedClassVersionError

The“Failedtoreadartifactdescriptor”message

Summary

2.IndexingYourData

UnderstandingtheSolrdatamodel

Thedocument

Theinvertedindex

TheSolrcore

TheSolrschema

Fieldtypes

Thetextanalysisprocess

Charfilters

Tokenizers

Tokenfilters

Puttingitalltogether

Someexamplefieldtypes

String

Numbers

Boolean

Date

Text

www.it-ebooks.info

Page 6: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Othertypes

Fields

Staticfields

Dynamicfields

Copyfields

Otherschemasections

Uniquekey

Defaultsimilarity

Solrindexingconfiguration

Generalsettings

Indexconfiguration

Updatehandlerandautocommitfeature

RequestHandler

UpdateRequestProcessor

Indexoperations

Add

Sendingaddcommands

Delete

Commit,optimize,androllback

Extendingandcustomizingtheindexprocess

Changingthestoredvalueoffields

Indexingcustomdata

Troubleshooting

MultivaluedfieldsandthecopyFielddirective

ThecopyFieldinputvalue

RequiredfieldsandthecopyFielddirective

Storedtextisimmutable!

Datanotindexed

Summary

3.SearchingYourData

Thesampleproject

www.it-ebooks.info

Page 7: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Querying

Search-relatedconfiguration

Queryanalyzers

Commonqueryparameters

Fieldlists

Filterqueries

Queryparsers

TheSolrqueryparser

Terms,fields,andoperators

Boosts

Wildcards

Fuzzy

Proximity

Ranges

TheDisjunctionMaximumqueryparser

QueryFields

Alternativequery

Minimumshouldmatch

Phrasefields

Queryphraseslop

Phraseslop

Boostqueries

Additiveboostfunctions

Tiebreaker

TheExtendedDisjunctionMaximumqueryparser

Fieldedsearch

Phrasebigramandtrigramfields

Phrasebigramandtrigramslop

Multiplicativeboostfunction

Userfields

Lowercaseoperators

www.it-ebooks.info

Page 8: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Otheravailableparsers

Searchcomponents

Query

Facet

Facetqueries

Facetfields

Facetranges

Pivotfacets

Intervalfacets

Highlighting

Standardhighlighter

Fastvectorhighlighter

Postingshighlighter

Morelikethis

Othercomponents

Searchhandler

Standardrequesthandler

Searchcomponents

Queryparameters

RealTimeGetHandler

Responseoutputwriters

ExtendingSolr

Mixingreal-timeandindexeddata

Usingacustomresponsewriter

Troubleshooting

Queriesdon’tmatchexpecteddocuments

Mismatchbetweenindexandqueryanalyzer

Noscoreisreturnedinresponse

Summary

4.ClientAPI

Solrj

www.it-ebooks.info

Page 9: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SolrServer–theSolrfaçade

Inputandoutputdatatransferobjects

Addsanddeletes

Search

Otherbindings

Summary

5.AdministeringandTuningSolr

Dashboard

PhysicalandJVMmemory

Diskusage

Filedescriptors

Logging

CoreAdmin

Javapropertiesandthreaddump

Coreoverview

Caches

Cachelifecycles

Cachesizing

Cachedobjectlifecycle

Cachestats

Typesofcache

Filtercache

QueryResultcache

Documentcache

Fieldvaluecache

Customcache

Queryhandlers

Updatehandlers

JMX

Summary

6.DeploymentScenarios

www.it-ebooks.info

Page 10: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Standaloneinstance

Shards

Master/slavesscenario

Shardswithreplication

SolrCloud

Clustermanagement

Replicationfactor,leaders,andreplicas

Durabilityandrecovery

Thenewterminology

Administrationconsole

CollectionsAPI

Distributedsearch

Cluster-awareindex

Summary

7.SolrExtensions

DataImportHandler

Datasources

Documents,entities,andfields

Transformers

Entityprocessors

Eventlisteners

ContentExtractionLibrary

LanguageIdentifier

RapidprototypingwithSolaritas

Otherextensions

Clustering

UIMAMetadataExtractionLibrary

MapReduce

Summary

8.ContributingtoSolr

Identifyingyourneeds

www.it-ebooks.info

Page 11: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Anexample–SOLR-3191

Subscribingtomailinglists

SigninguponJIRA

Settingupthedevelopmentenvironment

Versioncontrol

Codestyle

Checkingoutthecode

CreatingtheprojectinyourIDE

Makingyourchanges

Creatingandsubmittingapatch

Otherwaystocontribute

Documentation

Mailinglistmoderator

Summary

Index

www.it-ebooks.info

Page 12: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 13: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ApacheSolrEssentials

www.it-ebooks.info

Page 14: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 15: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ApacheSolrEssentialsCopyright©2015PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:February2015

Productionreference:1210215

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78439-964-1

www.packtpub.com

www.it-ebooks.info

Page 16: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 17: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CreditsAuthor

AndreaGazzarini

Reviewers

AhmadMaherAbdelwhab

MarkusKlose

JulianLam

PuneetSinghLudu

CommissioningEditor

UshaIyer

AcquisitionEditor

LarissaPinto

ContentDevelopmentEditor

KirtiPatil

TechnicalEditor

AnkurGhiye

CopyEditor

VikrantPhadke

ProjectCoordinator

NidhiJ.Joshi

Proofreaders

StephenCopestake

MariaGould

BernadetteWatkins

Indexer

PriyaSane

Graphics

AbhinashSahu

ProductionCoordinator

ShantanuN.Zagade

CoverWork

www.it-ebooks.info

Page 18: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ShantanuN.Zagade

www.it-ebooks.info

Page 19: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 20: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

AbouttheAuthorAndreaGazzariniisasoftwareengineer.HehasmainlyfocusedontheJavatechnology.Althoughofteninvolvedinanalysisanddesign,hestronglylovescodinganddefinitelylikestobeconsideredadeveloper.

Andreahasmorethan15yearsofexperienceinvarioussoftwarebranches,fromtelecomtobankingsoftware.Hehasworkedforseveralmedium-andlarge-scalecompanies,suchasIBMandOrgaSystems.

AndreahasseveralcertificationsintheJavaprogramminglanguage(programmer,developer,webcomponentdeveloper,businesscomponentdeveloper,andJEEarchitect),BEAproducts(buildandportalsolutions),andApacheSolr(LucidApacheSolr/LuceneCertifiedDeveloper).

In2009,Andreasteppedintothewonderfulworldofopensourceprojects,andinthesameyear,hebecameacommitterfortheApacheQpidproject.HisadventurewithSolrbeganin2010,whenhejoined@Cult,anItaliancompanythatmainlyfocusesitsprojectsonlibrarymanagementsystems,onlineaccesspubliccatalogs,andlinkeddata.

He’scurrentlyinvolvedinseveral(toomany!)projects,alwaysthinkingabouta“big”ideathatwillchangehis(developer)life.

www.it-ebooks.info

Page 21: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 22: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

AcknowledgmentsI’dliketobeginbythankingthepeoplewhomadethisbookwhatitis.Writingabookisnotasingleperson’swork,andhelpfromexperiencedpeoplethatguideyoualongthepathiscrucial.ManythankstoLarissa,Kirti,Ankur,andVikrantforsupportingmeinthisprocess.

Iamalsogratefultothetechnicalreviewersofthebook,AhmadMaherAbdelwhab,MarkusKlose,PuneetSinghLudu,andJulianLam,forcarefullyreadingmydraftsandspotting(hopefully)mostofmymistakes.Thisbookwouldnothavebeensogoodwithouttheirhelpandinput.

Ingeneral,Iwanttothankeveryonewhodirectlyorindirectlyhelpedmeincreatingthisbook,exceptforalong-sightedteacherwhooncetoldmewhenIwasinuniversity,“Hey,guywithallthoseearrings!Youwon’tgoanywhere!”

Finally,aspecialthoughttomyfamily;tomygirls,theactualsupportersofthebook;mywonderfulwife,Nicoletta(towhomIpromisenottowriteanotherbook),myprideandjoy,SofiaandCaterina,andmyfirstactualteacher—mymom,Lina.TheyarethepeoplewhoreallymadesacrificeswhileIwaswritingandwhodefinitelydeservethecreditsforthebook.

Onceagain,thankyou!

www.it-ebooks.info

Page 23: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 24: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

AbouttheReviewersAhmadMaherAbdelwhabiscurrentlyworkingatKnowledgewareTechnologiesasanopensourcedeveloper.Hehasover10yearsofexperience,withspecialdevelopmentskillsinPHP,Drupal,Perl,RubyOnRails,Java,XML,XSL,MySQL,PostgreSQL,MongoDB,SQL,andLinux.HegraduatedincomputersciencefromMansouraUniversityin2005.

Iwouldliketothankmyfather,mother,andsincerewifefortheircontinuoussupportwhilereviewingthisbook.

MarkusKloseisasearchandbigdataconsultantatSHIGmbH&Co.KGinGermany.Heisinchargeofprojectmanagementandsupervision,projectanalysis,anddeliveringconsultingandtrainingservices.

MostofMarkus’dailybusinessisrelatedtoApacheSolr,Elasticsearch,andFastESP.HetravelsacrossGermany,Switzerland,andAustriatoprovidehisservicesandknowledge.

Onaregularbasis,youcanfindhimatmeets,usergroups,orconferencessuchasBerlinBuzzwordoderSolrRevolution,wherehespeaksaboutApacheSolr.

Besidessearch-relatedtrainingandconsulting,heiscurrentlyestablishingadditionalareasofwork.HeusestoolssuchasLogstashandKibanatofulfillcustomerrequirementsinmonitoringandanalytics.

Thankstotheexperiencegainedfromhisdailywork,MarkuswrotethefirstGermanbookonApacheSolr(EinführunginApacheSolr)withhiscolleague,DanielWrigley.ItwaspublishedbyO’ReillyinFebruary2014.

Besideswriting,MarkusspendsalotofhisfreetimeusinghisknowledgeandprogrammingskillstoworkonandcontributetoopensourceprojectssuchasLatinstemmerandnumberconverterforSolr(https://issues.apache.org/jira/browse/LUCENE-4229)andSolrAppenderforlog4j2(https://issues.apache.org/jira/browse/LOG4J2-618).

JulianLamisacofounderandcoremaintainerofNodeBB,atypeoffreeandopensourceforumsoftwarebuiltuponmodernwebtools,suchasNode.jsandRedis.HehasspokenseveraltimesontopicsrelatedtoJavascriptintheworkplaceandbestpracticesforhiring.Julianisanadvocateofclient-siderendering,whichcanbeusedtobuildhighlyperformantwebapplications.

www.it-ebooks.info

Page 25: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 26: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.PacktPub.com

www.it-ebooks.info

Page 27: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.

DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

www.it-ebooks.info

Page 28: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

www.it-ebooks.info

Page 29: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.

HiDad,whenyouboughtmemyfirstcomputer,youhadnoideawhatwascomingnext…

www.it-ebooks.info

Page 30: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 31: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

PrefaceAsyoumayhaveguessedfromthetitle,thisisabookaboutApacheSolr—specificallyaboutSolressentials.WhatdoImeanbyessentials?Nicequestion!Suchatermcanbeseenfromsomanyperspectives.Solr,mainlyfrom2010onwards,witnessedexponentialgrowthintermsofpopularity,stakeholders,community,andthecapabilitiesitoffers.Thisrapidgrowthreflectstherichportfolioofthethingsthathavebeendevelopedintheseyearsandarenowadaysavailable.So,strictlyspeaking,it’snotsoeasytodefinethe“essentials”ofSolr.

TheperspectivethatIwillusetoexplaintheterm“essentials”isquitesimpleandpragmatic.IwilldescribethebuildingblocksofApacheSolr,andatthesametime,Iwilltrytoputmypersonalexperienceonthosetopics.Inrecentyears,I’veworkedwithSolrinseveralprojects.Asauser,Ihadtolearnhowtoinstall,configure,tune,troubleshoot,andmonitorSolr.Asadeveloper,thingsweredifferentforme.Ifyou’reworkingintheITdomainandyou’rereadingthisbook(Iguessyouare),youprobablyknowthateachtimeyoutrytoimplementasolution,there’ssomethingintheprojectthataspecifictooldoesn’tcover.So,afterspendingalotoftimeanalyzing,readingdocumentation,searchingontheInternet,readingWikis,andsoon,yourealizethatyouneedtoaddacustompieceofcodesomewhere.That’sbecause“theproductcoversthe99.9999percentofthepossiblescenariosbut…”Forthisspecificcase,ifthishappensorthathappens,youalwaysfallunderthat0.0001percent.Idon’tknowaboutyou,butforme,thishasalwaysbeenso.Nomatterwhattheproject,thecompany,ortheteamis,thishasbeenanimplicitconstantofeveryproject,always.

That’sthereasonIwilltryasmuchaspossibletoexplainthingsthroughoutthebookusingreal-worldexamplesdirectlycomingfrommypersonalexperience.Ihopethisadditionalperspectivewillbeusefulforbetterunderstandingofwhatisconsideredthemostpopularopensourcesearchplatform.

www.it-ebooks.info

Page 32: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

WhatthisbookcoversChapter1,GetMeUpandRunning,introducesthebasicconceptsofSolranditprovidesyouwithallthenecessarystepstoquicklygetitupandrunning.

Chapter2,IndexingYourData,beginsourfirstdetaileddiscussiononSolr.Inthischapter,welookatthedataindexingprocessandseehowitcanbeconfigured,tuned,andcustomized.Thisisalsowhereweencounterthefirstlineofcode.

Chapter3,SearchingYourData,explorestheotherspecularsideofSolr.First,westoredourdata;nowweexploreallthatSolroffersintermsofsearchservices.

Chapter4,ClientAPI,coversclient-sideusageofSolrlibraries,providingadescriptionofthemainusecasesfromaclient’sperspective.

Chapter5,AdministeringandTuningSolr,takesyouthroughtheavailabletoolsforconfiguring,managing,andtuningSolr.

Chapter6,DeploymentScenarios,illustratesthevariouswaysinwhichyoucandeploySolr,fromastandaloneinstancetoadistributedcluster.

Chapter7,SolrExtensions,describesseveralavailableSolrextensionsandhowtheycanbeusefulinsolvingcommonconcreteusecases.

Chapter8,ContributingtoSolr,explainsthewonderfulworldofopensourcesoftwarebyillustratingthecompoundingpiecesoftheprocessofparticipationandcontribution.

www.it-ebooks.info

Page 33: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 34: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

WhatyouneedforthisbookInordertobeabletorunthecodeexamplesinthebook,youwillneedtheJavaDevelopmentKit(JDK)1.7andApacheMaven.

Alternatively,youwillneedanIntegratedDevelopmentEnvironment(IDE).EclipseisstronglyrecommendedasitisthesameenvironmentIusedtocapturethescreenshots.However,evenifyouwanttouseanotherIDE,thestepsshouldbequitesimilar.

Thedifferencebetweenthetwoalternativesmainlyresidesintherolethatyouwanttoassumeduringthereading.Whileyoumaywanttoonlystartandexecutetheexamplesasauser,youwouldsurelywanttoseetheworkingcodeinausableenvironmentasadeveloper.That’sthereasonanIDEisstronglyrecommendedinthesecondcase.

Thefirstchapterwillprovidetheinstructionsnecessaryforinstallingallthatyou’llneedthroughthebook.

www.it-ebooks.info

Page 35: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 36: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

WhothisbookisforThisbookistargetedatpeople—usersanddevelopers—whoarenewtoApacheSolrorareexperiencedwithasimilarproduct.ThebookwillgraduallyhelpyoutounderstandthefocalconceptsofSolrwiththehelpofpracticaltipsandreal-worldusecases.Althoughalltheexamplesassociatedwiththebookcanbeexecutedwithafewsimplecommands,afamiliaritywiththeJavaprogramminglanguageisrequiredforagoodunderstanding.

www.it-ebooks.info

Page 37: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 38: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandexplanationsoftheirmeanings.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Eachfolderhasasubfoldercalledconfwheretheconfigurationforthatspecificcoreresides.”

Ablockofcodeissetasfollows:

{

{"id":1,"title":"TheBirthdayConcert"},

{"id":2,"title":"LiveinItaly"},

{"id":3,"title":"LiveinPaderborn"},

}

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:

<filterclass="solr.LowerCaseFilterFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"

ignoreCase="true"/>

Anycommand-lineinputoroutputiswrittenasfollows:

#mvncargo:run–PfieldAnalysis

Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“Chooseafieldtypeorafield.ThenpresstheAnalyseValuesbutton.”

NoteWarningsorimportantnotesappearinaboxlikethis.

TipTipsandtricksappearlikethis.

www.it-ebooks.info

Page 39: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 40: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

www.it-ebooks.info

Page 41: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 42: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

www.it-ebooks.info

Page 43: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Alternatively,youcanalsodownloadtheexamplesfromGitHub,onhttps://github.com/agazzarini/apache-solr-essentials.There,youcandownloadthewholecontentasazipfilefromhttps://github.com/agazzarini/apache-solr-essentials/archive/master.zipor,ifyouhavegitinstalledonyourmachine,youcanclonetherepositorybyissuingthefollowingcommand:

#gitclone

https://github.com/agazzarini/apache-solr-essentials.git

<path-to-your-work-dir>

Where,<path-to-your-work-dir>isthedestinationfolderwheretheprojectwillbecloned.

www.it-ebooks.info

Page 44: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

www.it-ebooks.info

Page 45: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

www.it-ebooks.info

Page 46: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.

www.it-ebooks.info

Page 47: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 48: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter1.GetMeUpandRunningThischapterdescribeshowtoinstallSolrandfocusesonalltherequiredstepstogetacompletestudyanddevelopmentenvironmentthatwillguideusthroughthebook.

Specifically,accordingtothedoubleperspectivepreviouslydescribed,Iwillillustratetwokindsofinstallations.ThefirstistheinstallationofastandaloneSolrinstance(thisisveryquick).Thisisasimpletaskbecausethedownloadbundleispreconfiguredwithallthatyouneedtogetyourfirsttasteoftheproduct.Asadeveloper,thesecondperspectiveiswhatIreallyneedeverydayinmyordinaryjob—aworkingintegrateddevelopmentenvironmentwhereIcanrunanddebugSolrwithmyconfigurationsandcustomizations,withouthavingtomanageanexternalserver.Ingeneral,suchanenvironmentwillhaveallthatIneedinoneplacefordeveloping,debugging,andrunningunitandintegrationtests.

Bytheendofthechapter,youwillhavearunningSolrinstanceonyourmachine,aready-to-useIntegratedDevelopmentEnvironment(IDE),andagoodunderstandingofsomebasicconcepts.

Thischapterwillcoverthefollowingtopics:

Installationofasimple,standaloneSolrinstancefromscratchSettingupofanIntegratedDevelopmentEnvironmentAquickoverviewaboutwhatweinstalledTroubleshooting

www.it-ebooks.info

Page 49: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

InstallingastandaloneSolrinstanceSolrisavailablefordownloadasanarchivethat,onceuncompressed,containsafullyworkinginstancewithinaJettyservletengine.Sothestepshereshouldbeprettyeasy.

www.it-ebooks.info

Page 50: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

PrerequisitesInthissection,wewilldescribeacoupleofprerequisitesforthemachinewhereSolrneedstobeinstalled.

Firstofall,Java6or7isrequired:theexactchoicedependsonwhichversionofSolryouwanttoinstall.Ingeneral,regardlessoftheversion,makesureyouhavethelatestupdateofyourJavaVirtualMachine(JVM).ThefollowingtabledescribestheassociationbetweenthelatestSolrandJavaversions:

Solrversion Javaversion

4.7.x Java6orgreater

4.8.x Java7(update55)orgreater;Java8isverifiedtobecompatible

4.9.x Java7(update55)orgreater;Java8isverifiedtobecompatible

4.10.x Java7(update55)orgreater

Javacanbedownloadedfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.

OtherfactorssuchasCPU,RAM,anddiskspacestronglydependonwhatyouaregoingtodowiththisSolrinstallation.Nowadays,itshouldn’tbehardtohaveacoupleofGBavailableonyourworkstation.However,bearinmindthatatthismomentI’mplayingonSolr4.9.0installedonaRaspberryPI(itsRAMis512MB).IgaveSolramaximumheap(-Xmx)of256MB,indexedabout500documents,andexecutedsomequerieswithoutanyproblem.Butagain,thosefactorsreallydependonwhatyouwanttodo:wecouldsaythat,assumingyou’reusingamodernPCforastudyinstance,hardwareresourcesshouldn’tbeaproblem.

Instead,ifyouareplanningaSolrinstallationinatestorinaproductionenvironment,youcanfindausefulspreadsheetathttps://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls.

Althoughitcannotencompassallthepeculiaritiesofyourenvironment,itisdefinitelyagoodstartingpointforRAManddiskspaceestimation.

www.it-ebooks.info

Page 51: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DownloadingtherightversionThelatestversionofSolratthetimeofwritingis4.10.3,butalotofthingswewilldiscussinthebookarevalidforpreviousversionsaswell.

YoumightalreadyhaveSolrsomewhereandmightnotwanttoredownloadanotherinstance,yourcustomermightalreadyhaveapreviousversion,or,ingeneral,youmightnotwantthelatestversion.Therefore,Iwilltrytorefertoseveralversionsinthebook—from4.7.xto4.10.x—asoftenaspossible.Eachtimeafeatureisdescribed,Iwillindicatetheversionwhereitappearedfirst.

Thedownloadbundleisusuallyavailableasatgzorziparchive.Youcanfindthatathttps://lucene.apache.org/solr/downloads.html.

www.it-ebooks.info

Page 52: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SettingupandrunningtheserverOncetheSolrbundlehasbeendownloaded,extractitinafolder.Wewillrefertothatfolderas$INSTALL_DIR.TypethefollowingcommandtoextracttheSolrbundle:

#tar-xvf$DOWNLOAD_DIR/solr-x.y.z.tar.gz-C$INSTALL_DIR

or

#unzip$DOWNLOAD_DIR/solr-x.y.z.zip-d$INSTALL_DIR

dependingontheformatofthebundle.

Attheend,youwillfindanewsolr-x.y.zfolderinyour$INSTALL_DIRfolder.ThisfolderwillactasacontainerforallSolrinstancesyoumaywanttoplaywith.Hereisascreenshotofthesolr-x.y.zfolderonmymachine,whereyoucanseeIhavethreeSolrversions:

Thesolr-x.y.zdirectorycontainsJetty,afastandsmallservletengine,withSolralreadydeployedinside.So,inordertostartSolr,weneedtostartJetty.Openanewshellandtype

www.it-ebooks.info

Page 53: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

thefollowingcommands:

#cd$INSTALL_DIR/solr-x.y.z/example

#java-jarstart.jar

Youshouldseealotoflogmessagesendingwithsomethinglikethis:

...

[INFO]org.eclipse.jetty.server.AbstractConnector–Started

[email protected]:8983

...

[INFO]org.apache.solr.core.SolrCore–[collection1]Registerednew

searcherSearcher@66b664d7[collection1]

main{StandardDirectoryReader(segments_2:3:nrt_0(4.9):C32)}

ThesemessagestellyouSolrisup-and-running!Openawebbrowserandtypehttp://127.0.0.1:8983/solr.

Youshouldseethefollowingpage:

ThisistheSolradministrationconsole.

www.it-ebooks.info

Page 54: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 55: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SettingupaSolrdevelopmentenvironmentThissectionwillguideyouthroughthenecessarystepstohaveaworkingdevelopmentenvironmentthatallowsyoutohaveaplacetowriteandexecuteyourcodeorconfigurationsagainstarunninganddebuggableSolrinstance.

Ifyouaren’tinterestedinsuchaperspectivebecause,forinstance,yourusagescenariofallswithintheprevioussection,youcansafelyskipthisandproceedwiththenextsection.

Thesourcecodeincludedwiththisbookcontainsaready-to-useprojectforthissection.Iwilllaterexplainhowtogetitintoyourworkspaceinoneshot.

www.it-ebooks.info

Page 56: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

PrerequisitesThedevelopmentworkstationneedstohavesomesoftware.Asyoucansee,Ikeptthelistsmallandminimal.

Firstly,youneedtheJavaDevelopmentKit7(JDK),ofwhichIrecommendthelatestupdate,althoughtheolderversionofSolrcoveredbythisbook(4.7.x)isabletorunwithJava6.Java7issupportedfrom4.7.xto4.10.x,soitisdefinitelyarecommendedchoice.

Lastly,weneedanIDE.Specifically,IwilluseEclipsetoillustrateanddescribethedeveloperperspective,soyoushoulddownloadarecentJSEversion(thatis,EclipseIDEforJavaDevelopers)fromhttps://www.eclipse.org/downloads.

NoteDonotdownloadtheEEversionofEclipsebecauseitcontainsalotofthingswedon’tneedinthisbook.

StartingfromEclipseJuno,alltherequiredpluginsarealreadyincluded.However,ifyouloveanolderversionofEclipse(suchasIndigo)likeIdo,thenMavenintegrationforEclipse—alsoknownasM2Eclipse(M2E)—needstobeinstalled.YoucanfindthisintheEclipsemarketplace(gotoHelp|EclipseMarketplace,thensearchform2e,andclickontheInstallbutton).

www.it-ebooks.info

Page 57: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ImportingthesampleprojectofthischapterIt’stimetoseesomecode,inordertotouchthingswithyourhands.WewillguideyouthroughthenecessarystepstohaveyourEclipseconfiguredwithasampleproject,whereyouwillbeabletostart,stop,anddebugSolrwithyourcode.

First,youhavetoimporttoEclipsethesampleprojectinyourlocalch1folder.Iassumeyoualreadygotthesourcecodefromthepublisher’swebsiteorfromGithub,asdescribedinthePreface.OpenEclipse,createanewworkspace,andgotoFile|Import|Maven|ExistingMavenProjects.

TipDownloadingtheexamplecode

Youcandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Alternatively,youcanalsodownloadtheexamplesfromGitHub,onhttps://github.com/agazzarini/apache-solr-essentials.There,youcandownloadthewholecontentasazipfilefromhttps://github.com/agazzarini/apache-solr-essentials/archive/master.zipor,ifyouhavegitinstalledonyourmachine,youcanclonetherepositorybyissuingthefollowingcommand:

#gitclonehttps://github.com/agazzarini/apache-solr-essentials.git<path-

to-your-work-dir>

Where<path-to-your-work-dir>isthedestinationfolderwheretheprojectwillbecloned.

Inthedialogboxthatappears,selectthech1folderandclickontheFinishbutton.EclipsewilldetecttheMavenlayoutofthatfolderandwillcreateanewprojectonyourworkspace,asillustratedinthefollowingscreenshot(ProjectExplorerview):

www.it-ebooks.info

Page 58: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 59: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UnderstandingtheprojectstructureTheprojectyou’veimportedisverysimpleandcontainsjustfewlinesofcode,butitisusefulforintroducingsomecommonconceptsthatwillguideusthroughthebook(theotherchaptersuseexampleswithasimilarstructure).

Thefollowingtableshowsthestructureoftheproject:

FolderorFile Description

src/main/java

Themainsourcefolder.Itisemptyatthemoment,butitwillcontaintheSolrextensions(anddependentclasses)youwanttoimplement.Youwon’tfindthisdirectoryinthisfirstprojectbecausewedon’thavethesourcefilesyet.

src/main/resourcesThiscontainsprojectresourcessuchaspropertiesandconfigurationfiles.Youwon’tfindthisdirectoryinthisfirstprojectbecausewedon’thaveanyresourcesyet.

src/test/javaThissourcefoldercontainsUnitandIntegrationtests.Forthisfirstproject,youwillfindasingleintegrationtesthere.

src/test/resourcesThiscontainstestresourcessuchaspropertiesandconfigurationfiles.Itincludesasampleloggingconfiguration(log4j.xml).

src/dev/eclipse PreconfiguredEclipselaunchersusedtorunSolrandtheexamplesintheproject.

src/solr-home ThiscontainstheSolrconfigurationfiles.Wewilldescribethecontentofthisdirectorylater.

pom.xmlThisistheMavenProjectdefinition.Here,youcanconfigureanyfeatureofyourproject,includingdependencies,properties,andsoon.

WithintheMavenprojectdefinition(thatis,pom.xml),youcandoalotofthings.Forourpurposesrightnow,itisimportanttounderlinethepluginsection,whereyoucanseetheMavenCargoPlugin(http://cargo.codehaus.org/Maven2+plugin)configuredtorunanembeddedJetty7containeranddeploySolr.Here’sascreenshotthatshowstheCargoPluginconfigurationsection:

www.it-ebooks.info

Page 60: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

IfyouhavetheBuildautomaticallyflagset(thedefaultbehaviorinEclipse),mostprobablyEclipsehasalreadydownloadedalltherequireddependencies.ThisisoneofthegreatthingsaboutApacheMaven.

So,assumingthatyouhavenoerrors,it’snowtimetostartSolr.ButwhereisSolr?

Thefirstquestionthatprobablycomestomindis:“Ididn’tdownloadSolr!Whereisit?”TheanswerisstillApacheMaven,whichisdefinitelyagreatopensourcetoolforsoftwaremanagementandsomethingthatsimplifiesyourlife.

MavenisalreadyincludedinyourEclipse(bymeansofthem2eplugin),andtheprojectyoupreviouslyimportedisafullycompliantMavenproject.

Sodon’tworry!WhenwestartaMavenbuild,Solrwillbedownloadedautomatically.Butwhere?InyourlocalMavenrepository,andyoudon’tneedtoconcernyourselfwiththat.

NoteWithinthepom.xmlfile,youwillfindaproperty,<solr.version>,withaspecificvalue.Ifyouwanttouseadifferentversion,justchangethevalueofthisproperty.

www.it-ebooks.info

Page 61: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DifferentwaystorunSolrIt’stimetostartSolrinyourIDEforthefirsttimebut,priortothat,it’simportanttodistinguishthetwowaystorunSolr:

Backgroundserver:Asabackgroundserver,sothatyoucanstartandstopSolrfordebuggingpurposesIntegrationtestserver:AsanintegrationtestserversothatyoucanhaveadedicatedSolrinstancetorunyourintegrationtestssuite

BackgroundserverThefirstthingyouwillneedinyourIDEisaserverinstancethatyoucanstart,stop,and(ingeneral)managewithafewsimplecommands.

Inthisway,youwillbeabletohaveSolrrunningwithyourconfigurations.Youcanindexyourdataandexecutequeriesinorderto(manually)ensurethatthingsareworkingasexpected.

Togetthistypeofserver,followtheseinstructions:

1. Right-clickontheprojectandcreateanewMaven(Debug)launchconfiguration(DebugAs|Mavenbuild…).

2. Inthedialog,typecargo:runintheGoalstextfield.3. Next,clickontheDebugbuttonasshowninthefollowingscreenshot:

Theveryfirsttimeyourunthiscommand,Mavenwilldownloadalltherequireddependenciesandplugins,includingSolr.Attheend,itwillstartanembeddedJettyinstance.

www.it-ebooks.info

Page 62: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

NoteWhyaDebuginsteadofaRunconfiguration?

YoumustuseaDebugconfigurationsothatyouwillbeabletostoptheserverbysimplypressingtheredbuttonontheEclipseconsole.Runconfigurationshaveanannoyinghabit:Eclipsewillsaytheprocessisstopped,butJettywillbestillrunning,oftenleavinganorphanprocess.

YoushouldseethefollowingoutputintheEclipseconsole:

[INFO]------------------------------------------------------------

[INFO]BuildingChapter1Project1.0

[INFO]----------------------------------------------------------

Downloading:http://repo1.maven.org/maven2/org/apache/solr/solr/4.9.0/solr-

4.9.0.war

Downloaded:http://repo1.maven.org/maven2/org/apache/solr/solr/4.8.0/solr-

4.9.0.war(28585KBat432.5KB/sec)

...

[INFO]Jetty7.6.15.v20140411Embeddedstartedonport[8983]

ThismeansthatSolrisupandrunninganditislisteningonport8983.Nowopenyourwebbrowserandtypehttp://127.0.0.1:8983/solr.YoushouldseetheSolradministrationconsole.

TipIntheproject,andspecificallyinthesrc/dev/eclipsefolder,therearesomeuseful,ready-to-useEclipselaunchers.Insteadoffollowingthemanualstepsillustratedpreviously,justright-clickonthestart-embedded-solr.launchfileandgotoDebugAs|run-ch1-example-server.launch.

IntegrationtestserverAnotherimportantthingyoucould(orshould,inmyopinion)doinyourprojectistohaveanintegrationtestsuite.Integrationtestsareclassesthat,asthenamesuggests,runverificationsagainstarunningserver.

Whenyou’reworkingonaprojectwithSolrandyouwanttoimplementanextension,asearchcomponent,oraplugin,youwillobviouslywanttoensurethatitisworkingproperly.Ifyou’rerunninganexternalSolrserver,youneedtopackyourclassesinajar,copythatbundlesomewhere(later,wewillseewhere),starttheserver,andexecuteyourchecks.

Therearealotofdrawbackswiththisapproach.Eachtimeyougetsomethingwrong,youneedtorepeatthewholeprocess:fix,pack,copy,restarttheserver,prepareyourdata,andrunthecheckagain.Also,youcannoteasilydebugyourclasses(orSolrclasses)duringthatiterativecheck.Allofthiswillmostprobablyendwithalotofstatementsinyourcodeasfollows:

System.out.println("BLABLABLA");

IsupposeyouknowwhatI’mtalkingabout.

www.it-ebooks.info

Page 63: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Thisiswhereintegrationtestsbecomeveryhelpful.YoucancodeyourchecksandyourassertionsasnormalJavaclasses,andhaveanautomatedtestsuitethatdoesthefollowingeachtimeitisexecuted:

StartsanembeddedSolrinstanceExecutesyourtestsagainstthatinstanceStopstheSolrinstanceProducesusefulreports

Theprojectwesetuppreviouslyhasthatcapabilityalready,andthere’saverybasicintegrationtestinthesrc/test/javafoldertosimplyaddandquerysomedata.

Inordertoruntheintegrationtestsuite,createanewMavenrunconfiguration(right-clickontheprojectandgotoRunAs|Mavenbuild…),and,inthedialogbox,typecleaninstallintheGoalstextfield:

AfterclickingontheRunbutton,youshouldseesomethinglikethis:

...

[INFO]Jetty7.6.15.v20140411Embeddedstarting…

...

[INFO]ReadingSolrSchemafromschema.xml

...

[INFO]Jetty7.6.15.v20140411Embeddedstartedonport[8983]

...

-------------------------------------------------------

TESTS

www.it-ebooks.info

Page 64: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

-------------------------------------------------------

Runningorg.gazzax.labs.solr.ase.ch1.it.FirstQueryITCase

...

Results:

Testsrun:1,Failures:0,Errors:0,Skipped:0

TipAsbefore,underthesrc/dev/eclipsefolder,thereisalreadyapreconfiguredEclipselauncherforthisscenario.Right-clickonthestart-embedded-solr.launchfileandgotoDebugAs|run-the-example-as-integration-test.

FromtheEclipselog,youcanseethatatest(specifically,anintegrationtest)hasbeensuccessfullyexecuted.Youcanfindthesourcecodeofthattestintheprojectwecheckedoutbefore.ThenameoftheclassthatisreportedinthelogisFirstQueryITCase(ITstandsforIntegrationTest),anditisintheorg.gazzax.labs.solr.ase.ch1.itpackage.

TheFirstQueryITCase.javaclassdemonstratesabasicinteractionflowwecanhavewithSolr:

//Thisisthe(input)DataTransferObjectbetweenyourclientandSOLR.

finalSolrInputDocumentinput=newSolrInputDocument();

//1.Populateswith(atleastrequired)fields

input.setField("id",1);

input.setField("title","ApacheSOLREssentials");

input.setField("author","AndreaGazzarini");

input.setField("isbn","972-2-5A619-12A-X");

//2.Addsthedocument

client.add(input);

//3.Commitchanges

client.commit();

//4.Buildsanewqueryobjectwitha"selectall"query.

finalSolrQueryquery=newSolrQuery("*:*");

//5.Executesthequery

finalQueryResponseresponse=client.query(query);

//6.Getsthe(output)DataTransferObject.

finalSolrDocumentoutput=response.getResults().iterator().next();

finalStringid=(String)output.getFieldValue("id");

finalStringtitle=(String)output.getFieldValue("title");

finalStringauthor=(String)output.getFieldValue("author");

finalStringisbn=(String)output.getFieldValue("isbn");

//7.1IncasewearerunningasaJavaapplicationprintoutthequery

results.

System.out.println("Itworks!Ifoundthefollowingbook:");

System.out.println("--------------------------------------");

System.out.println("ID:"+id);

System.out.println("Title:"+title);

www.it-ebooks.info

Page 65: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

System.out.println("Author:"+author);

System.out.println("ISBN:"+isbn);

//7.OtherwiseassertsthequeryresultsusingstandardJUnitprocedures.

assertEquals("1",id);

assertEquals("ApacheSOLREssentials",title);

assertEquals("AndreaGazzarini",author);

assertEquals("972-2-5A619-12A-X",isbn);

TipFirstQueryITCaseisanintegrationtestandamainclassatthesametime.Thismeansthatyoucanrunitinthreeways:asdescribedearlier,asamainclass,andasaJUnittest.Ifyoupreferthesecondorthethirdoption,remembertostartSolrbefore(usingtherun-ch1-example-server.launch).Youcanfindthelaunchersunderthesrc/dev/eclipsefolder.Justright-clickononeofthemandruntheexampleinonewayoranother.

www.it-ebooks.info

Page 66: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 67: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Whatdowehaveinstalled?Regardlessofthekindofinstallation,youshouldnowhaveaSolrinstanceupandrunning,soit’stimetohaveaquickoverviewofitsstructure.

SolrisastandardJEEwebapplication,packagedasa.wararchive.Ifyoudownloadedthebundlefromthewebsite,youcanfinditunderthewebappsfolderofJetty,usuallyunder:

$INSTALL_DIR/solr-x.y.z/example/webapps

Instead,ifyoufollowedthedeveloperway,Mavendownloadedthatwarfileforyou,anditisnowinyourlocalrepository(usuallyafoldercalled.m2underyourhomedirectory).

www.it-ebooks.info

Page 68: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SolrhomeInanycase,Solrhasbeeninstalledandyoudon’tneedtoconcernyourselfwithwhereitisphysicallylocated,mainlybecauseallthatyouhavetoprovidetoSolrmustresideinanexternalfolder,usuallyreferredtoastheSolrhome.

Inthedownloadbundle,there’sapreconfiguredSolrhomefolderthatcorrespondstothe$INSTALL_DIR/solr-x.y.z/example/solrfolder.WithinyourEclipseproject,youcanfindthatunderthesrcfolder;itiscalled(notsurprisingly)solr-home.

InaSolrhomefolder,youwilltypicallyfindafilecalledsolr.xml,andoneormorefoldersthatcorrespondtoyourSolrcores(wewillseewhatacoreis,inChapter2,IndexingYourData).Eachfolderhasasubfoldercalledconfwheretheconfigurationforthatspecificcoreresides.

www.it-ebooks.info

Page 69: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

solr.xmlThefirstfileyouwillfindwithintheSolrhomedirectoryissolr.xml.Itdeclaressomeconfigurationparametersabouttheinstance.

Previously(inSolr4.4),youhadtodeclareallthecoresofyourinstanceinthisfile.Nowthere’samoreintelligentautodiscoverymechanismthathelpsyouavoidexplicitdeclarationsaboutthecoresthatarepartofyourconfiguration.

Inthedownloadbundle,youwillfindanexampleofaSolrhomewithonlyonecore:

$INSTALL_DIR/solr-x.y.z/example/solr

Thereisalsoanexamplewithtwocores:

$INSTALL_DIR/solr-x.y.z/example/multicore

Thisdirectoryisbuiltusingtheoldstylewementionedpreviously,withallthecoresexplicitlydeclared.IntheEclipseproject,youcanfindthesinglecoreexampleinadirectorycalledsolr-home.Themulticoreexampleisintheexample-solr-home-with-multicorefolder.

www.it-ebooks.info

Page 70: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

schema.xmlAlthoughtheschema.xmlfilewillbedescribedindetaillater,itisimportanttobrieflymentionitbecausethisistheplacewhereyoucandeclarehowyourindex(ofaspecificcore)iscomposed,intermsoffields,types,andanalysis,bothatindextimeandquerytime.Inotherwords,thisistheschemaofyourindexand(mostprobably)thefirstthingyouhavetodesignaspartofyourSolrproject.

Inthedownloadbundleyoucanfindtheschema.xmlsampleunderthe$INSTALL_DIR/solr-x.y.z/example/solr/collection1/conffolder,whichishugeandfullofcomments.ItbasicallyillustratesallthepredefinedfieldsandtypesyoucanuseinSolr(youcancreateyourowntype,butthat’sdefinitelyanadvancedtopic).

Ifyouwanttoseesomethingsimplerfornow,theEclipseprojectunderthesolr-home/confdirectoryhasaverysimpleschema,withafewfieldsandonlyonefieldtype.

www.it-ebooks.info

Page 71: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

solrconfig.xmlThesolrconfig.xmlfileiswheretheconfigurationofaSolrcoreisdefined.Itcancontainalotofdirectivesandsectionsbut,fortunatelyformostofthem,Solr’screatorshavesetdefaultvaluestobeautomaticallyappliedifyoudon’tdeclarethem.

NoteDefaultvaluesaregoodforalotofscenarios.WhenIwasinBarcelonaattheApacheLuceneEuroconin2011,thespeakeraskedduringapresentation,“Howmanyofyouhaveeverchangeddefaultvaluesinsolrconfig.xml?”Inalargeroom(200people),onlyfiveorsixguysraisedtheirhands.

Thisismostprobablythesecondfileyouwillhavetoconfigure.Oncetheschemahasbeendefined,youcanfine-tunetheindexchainandsearchbehaviorofyourSolrinstancehere.

www.it-ebooks.info

Page 72: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OtherresourcesSchemaandSolrconfigurationscanmakeuseofotherfilesforseveralpurposes.Thinkaboutstopwords,synonyms,orotherconfigurationfilesspecifictosomecomponent.ThosefilesareusuallyputintheconfdirectoryoftheSolrcore.

www.it-ebooks.info

Page 73: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 74: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TroubleshootingIfyouhaveproblemsrelatedtowhatwedescribedpreviously,thefollowingtipsshouldhelpyougetthingsworking.

www.it-ebooks.info

Page 75: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UnsupportedClassVersionErrorYoucaninstallmorethanoneversionofJavaonyourmachinebut,whenrunningacommand(forexample,javaorjavac),thesystemwillpickupthejavainterpreter/compilerthatisdeclaredinyourpath.SoifyougettheUnsupportedClassVersionErrorerror,itmeansthatyou’reusingawrongJVM(mostprobablyJava6orolder).InthePrerequisitessectionearlierinthischapter,there’satablethatwillhelpyou.However,thisistheshortversion:Solr4.7.xallowsJava6or7,butSolr4.8orgreaterrunsonlywith(atleast)Java7.

Ifyou’restartingSolrfromthecommandline,justtypethis:

#java-version

TheoutputofthiscommandwillshowtheversionofJavayoursystemisactuallyusing.Somakesureyou’rerunningtherightJVM,andalsocheckyourJAVA_HOMEenvironmentvariable;itmustpointtotherightJVM.

Ifyou’rerunningSolrinEclipse,aftercheckingwhatisdescribedpreviously(thatis,theJVMthatstartsEclipse),makesureyou’reusingacorrectJVMbynavigatingtoWindow|Preferences|Java|InstalledJREs.

www.it-ebooks.info

Page 76: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

The“Failedtoreadartifactdescriptor”messageWhenrunningacommandforthefirsttime(forexample,clean,install,ortest),ApacheMavenwillhavetodownloadalltherequiredlibraries.Inordertodothat,yoursystemmusthaveavalidInternetconnection.

Soifyougetthiskindofmessage,itmeansthatMavenwasn’tabletodownloadarequireddependency.Thenameofthedependencyshouldbeinthemessage.Thereasonforfailurecouldbeanetworkissue,eitherpermanentortransient.

Inthefirstcase,youshouldsimplycheckyourconnection.Inthesecondscenario(thatis,atransientnetworkfailureduringthedownload),therearesomemanualstepsthatneedtobedone.Assumethatthedependencyisorg.apache.solr:solr-solrj:jar:4.8.0.YoushouldgotoyourlocalMavenrepositoryandremovethecontentofthefolderthathoststhatdependency,likethis:

#rm-rf$HOME/.m2/repository/org/apache/solr/solr-solrj/4.8.0

Onthenextbuild,Mavenwilldownloadthatdependencyagain.

www.it-ebooks.info

Page 77: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 78: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthischapter,webeganourSolrtourwithaquickoverview,includingthestepsthatmustbeperformedwheninstallingSolr.Weillustratedtheinstallationprocessfrombothauser’sandadeveloper’sperspective.Regardlessofthepathyoufollowed,youshouldhaveaworkingSolrinstalledonyourmachine.

Inthenextchapter,wewillcontinueourconversationbydiggingfurtherintotheSolrindexingprocess.

www.it-ebooks.info

Page 79: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 80: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter2.IndexingYourDataAlthoughthefinalmotivebehindgettingaSolrinstanceistoenablefastandefficientsearches,weneedtopopulatethatinstancewithsomedatainthefirst(andmandatory)step.Thisoperationisusuallyreferredtoastheindexingphase.ThetermindexplaysanimportantroleintheSolrdomainbecauseitsunderlyingstructureisanindexitself.Thischapterfocusesontheindexingprocess.

Bytheendofthischapter,youwillbereasonablyconversantwithhowtheindexingprocessworksinSolr,howtoindexdata,andhowtoconfigureandcustomizetheprocess.

Thischapterwillcoverthefollowingtopics:

TheSolrdatamodel:invertedindex,document,fields,types,analyzers,andtokenizersIndexandindexingconfigurationTheSolrwritepathHowtoextendandcustomizetheindexingprocessTroubleshooting

www.it-ebooks.info

Page 81: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UnderstandingtheSolrdatamodelWheneverIstarttolearnsomethingthatisnotsimple,Istronglybelievethekeytocontrollingitscomplexityisagoodunderstandingofitsdomainmodel.ThissectiondescribestheunderlyingbuildingblocksofSolr.Itstartswiththesimplestpieceofinformation,thedocument,andthenwalksthoughtheotherfundamentalconcepts,describinghowtheyformtheSolrdatamodel.

www.it-ebooks.info

Page 82: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ThedocumentAdocumentrepresentsthebasicandatomicunitofinformationinSolr.Itisacontaineroffieldsandvaluesthatbelongtoagivenentityofyourdomainmodel(forexample,abook,car,orperson).

Ifyou’refamiliarwithrelationaldatabases,youcanthinkofadocumentasarecord.Thetwoconceptshavesomesimilarities:

Adocumentcouldhaveaprimarykey,whichisthelogicalidentityofdataitrepresents.Adocumenthasastructureconsistingofoneormoreattributes.Eachattributehasaname,type,andvalue.

However,aSolrdocumentdiffersinthefollowingwaysfromadatabaserecord:

Attributescanhavemorethanonevalue,whereasarowinadatabasetablecanhaveonlyonevalue(includingNULL).Attributeseitherhaveavalueordon’texistatall.There’snonotionofNULLvalueinSolr.Attributenamescanbestaticordynamic,buttablecolumnsinadatabasemustbeexplicitlydeclaredinadvance.Attributetypesare,ingeneral,morearticulatedandflexiblebecausetheymustdefinehowSolrinterpretsdatabothatindexandquerytime.Attributetypescanbedefinedandconfigured.Thiscanbedonebyusing,mixing,andconfiguringarichsetofbuilt-inclassesorcreatingnewtypes(thisisactuallyanadvancedscenario).

AsimplewaytorepresentaSolrdocumentisamap—ageneraldatastructurethatmapsuniquekeys(attributenames)tovalues,whereeachkey(thatis,attribute)canhaveoneormorevalues.ThefollowingJSONdatarepresentstwodocuments:

{

{

"id":27302038,

"title":"Abookaboutsomething",

"author":["Ashler,Frank","York,Lye"],

"subject":["Generalities","SocialSciences"],

"language":"English"

},

{

"id":2830002,

"title":"Anotherbookaboutsomething",

"author":"Ypsy,Lea",

"subject:"Geography&History",

"publisher":"Vignanello:Edikin,2010"

}

}

Althoughtheearlierdocumentsrepresentbooksandhavesomecommonattributesasyoucansee,thefirsthastwosubjectsandalanguage,whiletheseconddoesn’thavea

www.it-ebooks.info

Page 83: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

publicationlanguage.Ithasonlyonesubjectandanadditionalpublisherattribute.

Fromadocument’sperspective,there’snoconstraintaboutwhichandhowmanyattributesadocumentcanhave.ThoseconstraintsareinsteaddeclaredwithintheSolrschema,whichwewillseelater.

TipThesrc/solr/example-datafolderoftheprojectassociatedwiththischaptercontainssomeexampledatawherethesamedocumentsarerepresentedinseveralformats.

www.it-ebooks.info

Page 84: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheinvertedindexSolrusesanunderlying,persistentstructurecalledinvertedindex.Itisdesignedandoptimizedtoallowfastsearchesatretrievaltime.Togainthespeedbenefitsofsuchastructure,ithastobebuiltinadvance.

Aninvertedindexconsistsofanorderedlistofallthetermsthatappearinasetofdocuments.Besideeachterm,theindexincludesalistofthedocumentswherethattermappears.

Forexample,let’sconsiderthreedocuments:

{

{"id":1,"title":"TheBirthdayConcert"},

{"id":2,"title":"LiveinItaly"},

{"id":3,"title":"LiveinPaderborn"},

}

Thecorrespondinginvertedindexwouldbesomethinglikethis:

Terms DocumentIds

1 2 3

Birthday X

Concert X

Italy X

Live X X

Paderborn X

The X

In X X

Liketheindexofabook(here,Imeantheindexthatyouusuallyfindattheendofabook),ifyouwanttosearchdocumentsthatcontainagiventerm,aninvertedindexhelpyouwiththatefficientlyandquickly.

InSolr,indexfilesarehostedinaso-calledSolrdatadirectory.Thisdirectorycanbeconfiguredinsolrconfig.xml,themainconfigurationfile.

TipAfterrunninganyexampleintheprojectassociatedwiththisbook,youwillfindtheSolrindexunderthesubfolderslocatedintarget/solr.Thenameofthesubfolderactuallydependsonthenameofthecoreusedintheexample.

www.it-ebooks.info

Page 85: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheSolrcoreTheindexconfigurationofagivenSolrinstanceresidesinaSolrcore,whichisacontainerforaspecificinvertedindex.Onthedisk,Solrcoresaredirectories,eachofthemwithsomeconfigurationfilesthatdefinefeaturesandcharacteristicsofthecore.

Inacoredirectory,youwilltypicallyfindthefollowingcontent:

Acore.propertiesfilethatdescribesthecore.Aconfdirectorythatcontainsconfigurationfiles:aschema.xmlfile,asolrconfig.xmlfile,andasetofadditionalfiles,dependingoncomponentsinuseforaspecificinstance(forexample,stopwords.txtandsynonyms.txt).Alibdirectory.EveryJARfileplacedinthisdirectoryisautomaticallyloadedandcanbeusedbythatspecificcore.

InaSolrinstallationyoucanhaveoneormorecores,eachofthemwithadifferentconfiguration,thatwillthereforeresultindifferentinvertedindexes.

NoteTheconceptoftheSolrcorehasbeenexpandedinSolr4,specificallyinSolrCloud.WewilldiscussthisinChapter6,DeploymentScenarios.

www.it-ebooks.info

Page 86: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheSolrschemaReturningtothecomparisonwithdatabases,anotherimportantdifferenceisthat,inrelationaldatabases,dataisorganizedintables.Youcancreateoneormoretablesdependingonhowyouwanttoorganizethepersistenceoftheentitiesbelongingtoyourdomainmodel.

InSolr,thingsbehavedifferently.There’snonotionoftables;inaSolrschema,youmustdeclareattributes,aprimarykey,andasetofconstraintsandfeaturesoftheentityrepresentedbytheincomingdocuments.Althoughthisdoesn’tstrictlymeanyoumusthaveonlyoneentityinyourschema,let’sthinkinthiswayatthemoment(forsimplicity):aSolrschemaislikethedefinitionofasingletablethatdescribesthestructureandtheconstraintsoftheincomingdata(thatis,documents).

TheSolrschemaisdefinedinafilecalled(notsurprisingly)schema.xml.Itcontainsseveralconcepts,butthemostimportantarecertainlythoserelatedtotypesandfields.BeforeSolr4.8,typesandfieldsweredeclaredwithina<types>anda<fields>tag,respectively.Nowtheirdeclarationscanbemixed,whichallowsbettergroupingoffieldswiththeircorrespondingtypes.

TipYoucanfindasampleschemawithinthedownloadbundlewesetupinthepreviouschapter,specificallyunder$INSTALL_DIR/solr-x.y.z/example/solr/collection1/conf/schema.xml.Itishugeandcontainsalotofexamplesaboutpredefinedandbuilt-intypesandfields,withmanyusefulcomments.

FieldtypesFieldtypesareoneofthetop-levelentitiesdeclaredinSolrschemas.Afieldtypeisdeclaredusingthe<fieldType>element.Asyoucanseeintheexampleschema,youcanhaveasimpletype,suchasthis:

<fieldTypename="string"class="solr.StrField"sortMissingLast="true"/>

Youcanalsohavetypeswithalotofinformation,asshownhere:

<fieldTypename="text-general"class="solr.TextField"

positionIncrementGap="100">

<analyzertype="index">

<tokenizerclass="solr.StandardTokenizerFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>

<filterclass="solr.LowerCaseFilterFactory"/>

</analyzer>

<analyzertype="query">

<tokenizerclass="solr.StandardTokenizerFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>

<filterclass="solr.LowerCaseFilterFactory"/>

<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"/>

</analyzer>

</fieldType>

www.it-ebooks.info

Page 87: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Alltypesshareasetofcommonattributesthataredescribedinthefollowingtable:

Attribute Description

name Thenameofthefieldtype.Thisisrequired.

typeThefullyqualifiednameoftheclassthatimplementsthefieldtypebehavior.Thisisrequired.

sortMissingFirst

sortMissingLast

Optionalattributesthatarevalidonlyforsortablefields.Theydefinethesortpositionofthedocumentsthathavenovaluesforagivenfield.

indexedIfthisistrue,fieldsassociatedwiththistypewillbesearchable,sortablesandfacetable.

storedIfthisistrue,fieldsassociatedwiththistypeareretrievable.Briefly,storedfieldsarewhatSolrreturnsinsearchresponses.

multiValued Ifthisistrue,fieldsassociatedwiththistypecanhavemultiplevalues.

omitNorms

NormsarevaluesconsistingofonebyteperfieldwhereSolrrecordsindextimeboostandlengthnormalizationdata.Indextimeboostallowsonefieldtobeboostedhigherthanother.Lengthnormalizationallowsshorterfieldstobeboostedmorethanlongerfields.Ifyoudon’tuseindextimeboostanddon’twanttouselengthnormalization,thenthisattributecanbesettotrue.

omitTermsAndFrequencyPositions

Tokensproducedbytextanalysisduringtheindexprocessarenotsimplytext.Theyalsohavemetadatasuchasoffsets,termfrequency,andoptionalpayloads.Ifthisattributeissettotrue,thenSolrwon’trecordtermfrequenciesandpositions.

omitPositions Omitsthepositionsinindexedtokens.

positionsIncrementGapWhenafieldhasmultiplevalues,thisattributespecifiesthedistancebetweeneachvalue.Thisisusedtopreventunwantedphrasematches.

autogeneratePhraseQueriesOnlyvalidfortextfields.Ifthisissettotrue,thenSolrwillautomaticallygeneratephrasequeriesforadjacentterms.

compressed Inordertodecreasetheindexsize,storedvaluesoffieldscanbecompressed.

compressThreshold Wheneverthefieldiscompressed,thisistheassociatedcompressionthreshold.

Besidesallofthis,eachspecifictypecandeclareitsownattributes,dependingonthecharacteristicofthetypeitself.

Thetextanalysisprocess

Beforetalkingaboutfields,whicharethetop-levelbuildingblocksoftheSolrschema,let’sintroduceafundamentalconcept—textanalysis.

Thetextanalysisprocessconvertsanincomingvalueintokensbymeansofadedicatedtransformationchainthatisinchargeofmanipulatingtheoriginalinputvalue.Eachresultingtokenisthenpostedtotheindexwiththefollowingmetadata:

Positionincrement:Thepositionofthetokenrelativetotheprevioustokeninthe

www.it-ebooks.info

Page 88: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

inputstreamStartandendoffset:ThestartingandendingindexesofthetokenwithintheinputstreamPayload:Anoptionalbytearrayusedforseveralpurposes,suchasboosting

Atokenwithitsmetadataisusuallyreferredtoasaterm.

InSolr,textanalysishappensattwodifferentmoments:indexandsearchtime.Inthefirstcase,thevalueisthecontentofagivenfieldofagivendocumentthataclientsentforindexing.Inthesecondcase,theincomingvaluetypicallycontainssearchtermswithinaquery.

Inbothcases,youmusttellSolrhowtohandlethosevalues.Youcandothatintheschema,inthefieldtypessection.

Forfieldtypes,thefollowinggeneralrulesalwaysapply:

Ifthefieldtypeimplementationclassissolr.TextFieldoritextendssolr.TextField,thenSolrallowsyoutoconfigureoneortwoanalyzersectionsinordertocustomizetheindexand/orthequerytextanalysisprocessInothercases,noanalyzerscanbedefined,andtheconfigurationofthetypeisdoneusingtheavailableattributesofthetypeitself

Thisisanexampleofafieldtypedefinition:

<fieldTypename="text-general"class="solr.TextField"

positionIncrementGap="100">

<analyzertype="index">

</analyzer>

<analyzertype="query">

</analyzer>

</fieldType>

Here,youcanseetwodifferentanalyzersections.Inthefirstsection,youwilldeclarewhathappensatindextimeforagivenfieldassociatedwiththatfieldtype.Thesecondsectionhasthesamepurpose,butitisvalidforquerytime.

NoteIfyouhavethesameanalysisatindexandquerytimes,youcandefinejustone<analyzer>sectionwithnonameattribute.Thatwillbesupposedtobevalidforbothphases.

Withineachanalyzerdefinition,youdefinethetextanalysisprocessbymeansofcharacterfilters,tokenizers,andtokenfilters.

Charfilters

Charfiltersareoptionalcomponentsthatcanbesetatthebeginningoftheanalysischaininordertopreprocessfieldvalues.Theycanmanipulateacharacterstreambyadding,removing,orreplacingcharacterswhilepreservingtheoriginalcharacterposition.

www.it-ebooks.info

Page 89: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Inthefollowingexample,twocharfiltersareusedtoreplacediacritics(thatis,letterswithglyphssuchasà,ü)andremovesometext:

<analyzertype="index">

<charFilterclass="solr.MappingCharFilterFactory"mapping="mapping-

FoldToASCII.txt"/>

<charFilterclass="solr.PatternReplaceCharFilterFactory"pattern="\\

(Author\\)"replacement=""/>

</analizer>

NoteYoumustneverdeclaretheimplementationclass.Instead,declareitsfactory.

Usingtheprecedingchain,theMillöcker,Carltext(nameofauthor)willbecomeMillocker,Carl.

Acompletelistofavailablecharfilterscanbefoundathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories.

Tokenizers

Atokenizerbreaksanincomingcharacterstreamintooneormoretokensdependingonspecificcriteria.Theresultingsetoftokensisusuallyreferredtoasatokenstream.Ananalyzerchainallowsonlyonetokenizer.

Supposewehave“I’mwritingasimpletext”astheinputtext.Thefollowingtableshowshowtwosampletokenizerswork:

Tokenizer Description Tokens

WhitespaceTokenizer Splitsbywhitespaces “I’m”,“writing”,“a”,“simple”,“text”

KeywordTokenizer Doesn’tsplitatall “I’mwritingasimpletext”

Acompletelistofavailabletokenizerscanbefoundathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenizerFactories.

Tokenfilters

Tokenfiltersworkonaninputtokenstream,contributingsomekindoftransformationtoit.Analyzingtokenaftertoken,afiltercanapplyitslogicinordertoadd,remove,orreplacetokens,andcanthusproduceanewoutputtokenstream.

Tokenfilterscanbechainedtogetherinordertoproducecomplexanalysischains.Theorderinwhichthosefiltersaredeclaredisimportantbecausethechainitselfisnotcommutative.Twochainswiththesamefiltersinadifferentordercouldproduceadifferentoutputstream.

Thisisanextractofasamplefilterchain:

<filterclass="solr.LowerCaseFilterFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"

ignoreCase="true"/>

www.it-ebooks.info

Page 90: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Afilterdeclarationincludesthenameoftheimplementationfactoryclassandasetofattributesthatarespecifictoeachfilter.Intheprecedingchain,thisiswhathappensforeachtokenintheinputstream:

Thetokenismadeintolowercase,so“Happy”willbecome“happy”Ifthetokenisastopword,thatis,oneofthewordsdeclaredinafilecalledstopwords.txt,itgetsfilteredfromtheoutgoingstream

Acompletelistofavailabletokenfiltersisavailableathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories.

Puttingitalltogether

Thefollowingcodeillustratesacompletefieldtypedefinition:

<fieldTypename="my-text-type"class="solr.TextField"

positionIncrementGap="100">

<analyzertype="index">

<charFilterclass="solr.MappingCharFilterFactory"mapping="mapping-

FoldToASCII.txt"/>

<tokenizerclass="solr.WhitespaceTokenizerFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>

<filterclass="solr.LowerCaseFilterFactory"/>

</analyzer>

</fieldType>

Inordertogetaconcreteviewofwhathappensduringtheindexphaseofagivenfield,openashellinthetop-leveldirectoryoftheprojectassociatedwiththischapter.Next,typethefollowingcommand:

#mvncargo:run–PfieldAnalysis

TipYoucandothesamewithEclipsebycreatinganewMavenDebuglaunchconfiguration.Onthelaunchdialog,youmustfilltheGoalsinputfieldwithcargo:runandtheProfileinputfieldwithfieldAnalysis.

ThatwillstartaSolrinstancewithanexampleschemathatcontainsseveraltypes.OnceSolrhasbeenstarted,openyourbrowserandtypehttp://127.0.0.1:8983/solr/#/analysis/analysis.Thepagethatappearsletsyousimulatetheindexphaseofagivenvalue(thecontentofthelefttextarea)foragivenfieldorfieldtype(thecontentofthedrop-downmenuatthebottomofthepage).

TypesometextintheFieldValue(Index)textarea,chooseafieldtypeorafield,andpresstheAnalyseValuesbutton.Thepagewillshowtheinputandtheoutputvaluesofeachmemberoftheindexchain.Thefollowingscreenshotillustratestheresultingpageafteranalyzingthe“ApacheSolr”textwitharight_truncated_phrasefieldtype:

www.it-ebooks.info

Page 91: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Someexamplefieldtypes

Thissectionlistsanddescribessomeimportantfieldtypesandtheirmainfeaturesinanon-exhaustiveway.Theschema.xmlfileinthedownloadbundlecontainsalotofexampleswithalltheavailabletypes.

Inaddition,alistofallfieldtypesisavailableathttps://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr.String

Thestringtyperetainstheincomingvalueasasingletoken.

NoteThatdoesn’tmeanthefieldcannotbeindexed.Itonlymeansthatthefieldcannothaveauser-definedanalysischain.

Thistypeisusuallyassociatedwiththefollowing:

Indexedfields:Fieldsthatrepresentcodes,classifications,andidentifiers,suchasA340,853.92,SKU#22383,3919928832,292381,anden-USSortfields:Fieldsthatcanbeusedassortcriteria,suchasauthors,titles,andpublicationdates

Numbers

ThereareseveralnumerictypesdefinedinSolr.Theycanbeclassifiedintothreegroups:

BasictypessuchasIntField,FloatField,andLongField.Thesearethelegacytypesthatencodenumericvaluesasstrings.SortablefieldstypessuchasSortableDoubleField,SortableIntField,andSortableLongField.Thesearethelegacytypesthatencodenumericvaluesasstringsinordertomatchtheirnaturalnumericorder(thisisdifferentfromthestring’slexicographicorder).TriefieldstypessuchasTrieIntField,TrieFloatField,andTrieLongField.These

www.it-ebooks.info

Page 92: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

arethetypesthatindexnumericvaluesusingvariousandtunablelevelsofprecisioninordertoenableefficientrangequeriesandsorting.ThoselevelsareconfiguredusingaprecisionStepattributeinthefieldtypedefinition.

Thefirsttwogroups,basicandsortabletypes,aredeprecatedandwillsoonberemoved(mostprobablyinSolr5.0).ThisisbecausetheirfeaturesandcharacteristicsarealreadyincludedinTrietypes,whicharemoreefficientandprovideaunifiedwayofdealingwithnumbers.Boolean

Booleanfieldscanhaveavalueoftrueorfalse.Valuesof1,t,orTareinterpretedastrue.Date

TheformatthatSolrusesfordatesisarestrictedversionoftheISO8601DateandTimeformatandisoftheYYYY-MM-DDThh:mm:ss.SSSZform.Herearesomeexamplesofthisfieldtype:

2005-09-27T14:43:11Z

2011-08-23T02:43:00.992Z

TheZcharacterisaliteral,trailingconstantthatindicatestheUTCmethodofthedaterepresentation.Onlythemillisecondsareoptional.Iftheyaremissing,thedot(.)afterthesecondsmustberemoved.

Aswithnumbers,therearetwoavailabletypestorepresentdatesinSolr:

AbasicDateFieldtype,whichisadeprecatedlegacytypeTrieDateField,whichistherecommendeddatetype

Ausefulfeatureofdatetypesisasimpleexpressionlanguagethatcanbeusedtoformdynamicdateexpressions,likethis:

NOW+2YEARS

NOW+3YEARS–3DAYS

2005-09-27T14:43:00+1YEAR

Theexpressionlanguageallowsthefollowingkeywords:

Keyword Description

YEAR/YEARSOneormoreyears.Thesearebasicallysynonyms;thedifferenceisjusttomaketheexpressionsmorereadable(forexample,2YEARSisbetterthan2YEAR).

MONTH/MONTHS Oneormoremonths(forexample,NOW+4MONTHS,NOW–1MONTH).

DAY/DAYS/DATE Adayoracertainnumberofdays(forexample,NOW+1DAY).

HOUR/HOURS Anhouroracertainnumberofhours.

MINUTE/MINUTES Oneormoreminutes.

MILLI/MILLIS

www.it-ebooks.info

Page 93: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MILLISECOND

MILLISECONDS

Oneormoremilliseconds.

Text

Textisthebasictypeforfieldsthatcanhaveaconfigurabletextanalysis.Thisistheonlytypethatacceptsanalyzerchainsinconfigurations.Othertypes

Thefollowinglistbrieflydescribessomeotherinterestingtypes:

Currency:Thistypeprovidessupportformonetaryvalueswithadedicatedtype.Italsoincludesthecapabilitytopluginseveralprovidersfordeterminingexchangeratesbetweencurrencies.Binary:Thistypeisusedtohandlebinarydata.DataissentandretrievedinBase64-encodedstrings.Geospatialtypes:Twotypesareavailableforsupporttogeospatialsearches.ThefirstisLatLonType,fromSolr3.xonwards.Thesecondtype,SpatialRecursivePrefixTreeFieldType,isanewtypeintroducedinSolr4,anditsupportspolygonshapes.Random:Thisisusedtogeneraterandomsequences.Itisusefulifyouwantpseudorandomsortorderingofindexeddocuments.

FieldsFieldsarecontainersofvaluesassociatedwithaspecifictype.Theyrepresentthestructureandthecompositionoftheentityofyourdomainmodel.

Insimplewords,fieldsaretheattributesofthedocumentsyou’regoingtomanagewithSolr.So,forexample,ifSolrservesalibraryOnlinePublicApplicationCatalogue(OPAC),theentitiesintheschemawillmostprobablyrepresentbooks,andtheycouldhavefieldssuchastitle,author,ISBN,cover,andsoon.

Fieldsaredeclaredintheschema.Eachfielddeclarationincludesaname,type,andsetofattributes.Thisisanexampleoffielddeclaration:

<fieldname="title"type="string"indexed="false"stored="true"

required="true"multiValued="false"/>

Thefollowingtableliststheattributesthatcanbespecifiedforeachfield:

Keyword Description

name

Thenameofthefieldmustbeuniqueintheschemaandmustconsistonlyofalphanumericandunderscorecharacters.Itmustnotstartwithanunderscore,anditmustnothavebothaleadingandatrailingunderscorebecausethosekindsofnamesarereserved.

type Thisisthetypeassociatedwiththefield.

indexedIfthisistrue,fieldsassociatedwiththistypewillbesearchable,sortable,andfacetable.Itoverridesthesamesettingontheassociatedtype.

www.it-ebooks.info

Page 94: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

storedIfthisistrue,itmakesthefieldsassociatedwiththistyperetrievable.Itoverridesthesamesettingontheassociatedtype.

required Thismarksthefieldasmandatoryininputdocuments.

defaultAdefaultvaluethatwillbeusedatindextime,ifthefieldintheinputdocumentdoesn’thaveavalidvalue.

sortMissingFirst

sortMissingLast

Theseareoptionalattributesdefiningthesortpositionofthedocumentsthathavenovaluesforthatfield.Theyoverridethesamesettingsontheassociatedtype.

omitNorms Omitsthenormsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.

omitPositionsOmitsthetermpositionsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.

omitTermFreqAndPositionsOmitsthetermfrequencyandpositionsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.

termVectorsStoresthetermvectors.Atermvectorisalistofthedocument’stermsandtheirnumberofoccurrencesinthatdocument.

docValuesOnlyavailablefortheString,Trie,andUUIDfields.Thisattributeenhancestheindexbyaddingcolumn-orientedfieldstoadocument-to-valuemapping.

Staticfields

Thefirstcategoryoffieldscontainsthosestaticallydeclaredintheschema.Inthiscontext,staticsimplymeansthatthenameofthefieldisexplicitlyknowninadvance.Thisisanexampleofastaticfield:

<fieldname="isbn"(otherattributesfollow)/>

Dynamicfields

Therearecertainsituationswhereyoudon’tknowinadvancethenameofsomefieldsintheincomingdocuments.Althoughthismaysoundstrange,itisratherafrequentscenario.

Thinkaboutadocumentthatrepresentsabookandistheresultofsomekindofcataloguing.Ingeneral,abibliographicrecordhasalotoffields.Someofthemrepresenttextthatcanbeexpressedbycataloguersinseverallanguages.Forexample,youcanhaveabookwiththeseabstracts:

{

"id":92902893,

"abstract_en":"ThisistheEnglishsummary",

"abstract_es":"Ésteeselresumenenespañol",

(otherfieldsfollow)

}

Youcanhaveanotherbookwiththefollowingdefinition:

{

"id":92902893,

"abstract_it":"L'automazionedellabibliotecadigitale"

www.it-ebooks.info

Page 95: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

(otherfieldsfollow)

}

Sothequestionhereis,howcanwedefinetheabstractfield(orfields)inourschema?Thefirstapproachcouldbetodeclareseveralstaticfields—oneforeachlanguage—butthiswillbevalidonlyifweknowalltheinputlanguagesinadvance.Moreover,thisisnotveryextensiblebecauseaddinganewlanguage(forexample,abstract_ru)willrequireachangeintheschema.Dynamicfieldsarethealternative.

Afieldisdynamicwhenitsnameincludesaleadingoratrailingwildcard,thereforeallowingadynamicmatchwithincominginputfields.Adynamicfieldisdeclaredusingthe<dynamicField>element,asfollows:

<dynamicFieldname="abstract_*"(otherattributesfollow)/>

Thefieldwillcatchallfieldsthathaveaprefixequaltoabstract.Hence,itavoidstheneedtostaticallydefinefieldsonebyone,butmostimportantly,itwillcatchanyabstractfieldregardlessofitslanguagesuffix.

Copyfields

IntheSolrschema,youcanuseaspecialcopyFielddirectivetocopyonefieldtoanother.Thisisusefulwhenadocumenthasagivenfield,andstartingfromitsvalue,youwanttohaveotherfieldsinyourschemapopulatedwiththesamevaluebutwithadifferenttextanalysis.

Let’ssupposeyourdocumentsrepresentbooksthatcancontaintwodifferentkindsofauthors:

persons(forexample,DanteAlighieriandLeonardoDaVinci)corporates(forexample,AssociationforChildhoodEducationInternational)

Youmustshowthoseauthorsseparatelyintheuserinterface,aspartofcustomerrequirements.Youcangivethemdedicatedlabels,forexample.Atthesametime,thecustomerwantstohaveanauthorsearchfeatureontheuserinterfacethattriggersasearchforallkindsofauthors.ThefollowingscreenshotshowsaGUIwidgetthatisoftenusedinthesescenarios—asearchtoolbarwithadrop-downmenuthatallowstheusertoconstrainthescopeofthesearchwithinagivencontext(forexample,authors,subjects,andtitles):

Afirstapproachcouldbetohavetwostoredandindexedfields.Whentheusersearchesforanauthorbytypinganameorasurname,suchtermswillbesearchedwithinthosetwofields.Theschemainthiscaseshouldbeasfollows:

www.it-ebooks.info

Page 96: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

<fieldname="author_person"type="text"indexed="true"stored="true"…/>

<fieldname="author_corporate"type="text"indexed="true"stored="true"…/>

Asecondchoicecouldbetohaveamorecohesivedesignbyseparatingsearchandviewresponsibilities.Inthiscase,wewillhavetwostored(butnotindexed)fieldsrepresentingthetwokindsofauthors,andagenericindexed(butnotstored)author_searchfieldcontainingalltheauthorsofadocument,regardlessofitstype.Inthisway,theuserinterfacewillusethestoredfieldsforvisualization,whileSolrwillusethecatch-allauthor_searchfieldforsearches.ThisdesignintroducesthecopyFielddirective;hereisthecorrespondingschema:

<fieldname="author_person"type="string"indexed="false"stored="true"

required="false"multiValued="true"/>

<fieldname="author_corporate"type="string"indexed="false"stored="true"

required="false"multiValued="true"/>

<fieldname="author_search"type="text"indexed="true"stored="false"

required="false"multiValued="true"/>

<copyFieldsource="author_person"dest="author_search"/>

<copyFieldsource="author_corporate"dest="author_search"/>

ThecopyFielddirectivecopiestheincomingvalueofthesourcefieldinthedestfield;thus,attheend,theauthor_searchfieldwillcontainallkindsofauthors.

NoteInboththesourceanddestattributes,it’spossibletouseatrailingoraleadingwildcard,thereforeavoidingrepetitivecode.Intheprecedingexample,wecouldhavejustonecopyFielddeclaration:

<copyFieldsource="author_*"dest="author_search"/>

OtherschemasectionsOtherthanfieldsandfieldtypes,theSolrschemacontainssomeotherthingsaswell.Thissectionbrieflyillustratesthem.

Uniquekey

Thisfielduniquelyidentifiesyourdocument.Thisisnotstrictlyrequiredbutstronglyrecommendedifyouwanttoupdateyourdocuments,avoidduplicates,and(lastbutnotleast)useSolrdistributedfeatures.

Defaultsimilarity

ThiselementallowsyoutodeclarethefactoryoftheclassusedbySolrtodeterminethescoreofdocumentswhilesearching.

www.it-ebooks.info

Page 97: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 98: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SolrindexingconfigurationOncetheschemahasbeendefined,it’stimetoconfigureandtunetheindexingprocessbymeansofanotherfilethatresidesinthesamedirectoryoftheschema—solrconfig.xml.

Thefilecontainsalotofsections,butfortunately,therearealotofoptionalpartswithdefaultvaluesthatusuallyworkwellinmostscenarios.Wewilltrytounderlinethemostimportantofthemwithrespecttothischapter.

Asageneralnote,it’spossibletousesystempropertiesanddefaultvalueswithinthisfile.Therefore,weareabletocreateadynamicexpression,likethis:

<dataDir>${my.data.dir:/var/data/defaultDataDir}</dataDir>

ThevalueofthedataDirelementwillbereplacedatruntimewiththevalueofthemy.data.dirsystemproperty,orwiththedefaultvalueof/var/data/defaultDataDirifthatpropertydoesn’texist.

www.it-ebooks.info

Page 99: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

GeneralsettingsTheheadingpartofthesolrconfig.xmlfilecontainsgeneralsettingsthatarenotstrictlyrelatedtotheindexphase.

ThefirstistheLucenematchversion:

<luceneMatchVersion>LUCENE_47</luceneMatchVersion>

ThisallowsyoutocontrolwhichversionofLucenewillbeinternallyusedbySolr.ThisisusefultomanagemigrationphasestowardsthenewerversionsofSolr,thusallowingbackwardcompatibilitywithindexesbuiltwithpreviousversions.

Asecondpieceofinformationyoucansethereisthedatadirectory,thatis,thedirectorywhereSolrwillcreateandmanagetheindex.Itdefaultstoadirectorycalleddataunder$SOLR_HOME.

<dataDir>/var/data/defaultDataDir</dataDir>

www.it-ebooks.info

Page 100: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

IndexconfigurationThesectionwithinthe<indexConfig>tagcontainsalotofthingsthatyoucanconfigureinordertofine-tunetheSolrindexphase.

Acuriousthingyoucanseeinthissection,inthesolrconfig.xmlfileoftheexamplecore,isthatmostthingsarecommented.Thisisveryimportant,becauseitmeansthatSolrprovidesgooddefaultvaluesforthosesettings.

Thefollowingtablesummarizesthesettingsyouwillfindwithinthe<indexConfig>section:

Attribute Description

writeLockTimeout ThemaximumallowedtimetowaitforawritelockonanIndexWriter.

maxIndexingThreadsThemaximumallowednumberofthreadsthatindexdocumentsinparallel.Oncethisthresholdhasbeenreached,incomingrequestswillwaituntilthere’sanavailableslot.

useCompoundFileIfthisissettotrue,Solrwilluseasinglecompoundfiletorepresenttheindex.Thedefaultvalueisfalse.

ramBufferSizeMBWhenaccumulateddocumentupdatesexceedthismemorythreshold,allpendingupdatesareflushed.

ramBufferSizeDocsThishasthesamebehaviorasthatofthepreviousattribute,butthethresholdisdefinedasthecountofdocumentupdates.

mergePolicy Thenamesoftheclass,alongwithsettings,thatdefinesandimplementsthemergestrategy.

mergeFactor

Athresholdindicatinghowmanysegmentsanindexisallowedtohavebeforetheyaremergedintoonesegment.Eachtimeanupdateismade,itisaddedtothemostrecentindexsegment.Whenthatsegmentfillsup(thatis,whenthemaxBufferedDocsandramBufferSizeMBthresholdsarereached),anewsegmentiscreatedandsubsequentupdatesareinsertedthere.Oncethenumberofsegmentsreachesthisthreshold,Solrwillmergeallofthemintoonesegment.

mergeScheduler Theclassthatisresponsibleforcontrollinghowmergesareexecuted.

lockType ThelocktypeusedbySolrtoindicatethatagivenindexisalreadyownedbyIndexWriter.

www.it-ebooks.info

Page 101: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UpdatehandlerandautocommitfeatureThe<UpdateHandlerSection>configuresthecomponentthatisresponsibleforhandlingrequeststoupdatetheindex.

Thisiswhereit’spossibletotellSolrtoperiodicallyrununsolicitedcommitssothatclientswon’tneedtodothatexplicitlywhileindexing.Declaringtwodifferentthresholdscantriggerauto-commits:

maxDocs:ThemaximumnumberofdocumentstoaddsincethelastcommitmaxTime:Themaximumamountoftime(inmilliseconds)topassforadocumentbeingaddedtoindex

Theyarenotexclusive,soit’sperfectlylegaltohavesettingssuchasthese:

<autoCommit>

<maxDocs>5000</maxDocs>

<maxTime>300000</maxTime>

</autoCommit>

StartingfromSolr4.0,therearetwokindsofcommit.Ahardcommitflushestheuncommitteddocumentstotheindex,thereforecreatingandchangingsegmentsanddatafilesonthedisk.Theothertypeiscalledsoftcommit,whichdoesn’tactuallywriteuncommittedchangesbutjustreopenstheinternalSolrsearcherinordertomakeuncommitteddatainthememoryavailableforsearches.

Hardcommitsareexpensive,butaftertheirexecution,dataispermanentlypartoftheindex.Softcommitsarefastbuttransient,soincaseofasystemcrash,changesarelost.

HardandsoftcommitscancoexistinaSolrconfiguration.Thefollowingisanexamplethatshowsthis:

<autoCommit>

<maxTime>900000</maxTime>

</autoCommit>

<autoSoftCommit>

<maxTime>1000</maxTime>

</autoSoftCommit>

Here,asoftcommitwillbetriggeredeverysecond(1000milliseconds),andahardcommitwillrunevery15minutes(900000milliseconds).

www.it-ebooks.info

Page 102: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

RequestHandlerARequestHandlerinstanceisapluggablecomponentthathandlesincomingrequests.Itisconfiguredinsolrconfig.xmlasaspecificendpointbymeansofitsnameattribute.

RequestssenttoSolrcanbelongtoseveralcategories:search,update,administration,andstats.Inthiscontext,weareinterestedinthosehandlersthatareinchargeofhandlingindexupdaterequests.Althoughnotmandatory,thosehandlersareusuallyassociatedwithanamestartingwiththe/updateprefix,forexample,thedefaulthandleryouwillfindintheconfiguration:

<requestHandlername="/update"class="solr.UpdateRequestHandler"/>

PriortoSolr4,eachkindofinputformat(forexample,JSON,XML,andsoon)requiredadedicatedhandlertobeconfigured.Nowthegeneral-purposeupdatehandler,thatis,the/updatehandlerusesthecontenttypeoftheincomingrequestinordertodetecttheformatoftheinputdata.Thefollowingtableliststhebuilt-incontenttypes:

Mime-type Description

application/xml

text/xmlXMLmessages

application/json

text/jsonJSONmessages

application/csv

text/csvComma-separatedvalues

application/javabin Java-serializedobjects(Javaclientsonly)

Eachformathasitsownwayofencodingthekindofupdateoperation(forexample,add,delete,andcommit)andtheinputdocuments.ThisisasampleaddcommandinXML:

<add>

<doc>

<fieldname="id">12020</field>

<fieldname="title">Roundaroundmidnight</field>

</doc>

</add>

Later,wewillindexsomedatausingdifferenttechniquesanddifferentformats.

www.it-ebooks.info

Page 103: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UpdateRequestProcessorThewritepathoftheindexprocesshasbeenconceivedbySolrdeveloperswithmodularityandextensibilityinmind.Specifically,theindexprocesshasbeenstructuredasachainofresponsibilities,whereeachsetofcomponentsaddsitsowncontributiontothewholeindexprocess.

TheUpdateRequestProcessorchainisanimportantconfigurableaspectoftheindexprocess.Ifyouwanttodeclareyourcustomchain,youneedtoaddacorrespondingsectionwithintheconfiguration.Thisisanexampleofacustomchain:

<updateRequestProcessorChainname="my-index-chain">

<processorclass="…"/>

<processorclass="…">

<strname="aParameterName">aParameterValue</str>

</processor>

<processorname="solr.RunUpdateProcessorFactory"/>

<processorname="solr.LogUpdateProcessorFactory"/>

</updateRequestProcessorChain>

DefininganewchainrequiresanameandasetofUpdateRequestProcessorFactorycomponentsthatareinchargeofcreatingprocessorinstancesforthatchain.

NoteActually,thedefinitionofthechainisnotenough.Itmustbeenabled,(thatis,associatedwithRequestHandler)inthefollowingway:

<requestHandlername="/myReqHandler"

class="solr.UpdateRequestHandler">

<lstname="defaults">

<strname="update.chain">chain.name</str>

</lst>

</requestHandler>

TherearealotofalreadyimplementedUpdateRequestProcessorcomponentsthatyoucanuseinyourchain,butingeneral,it’sabsolutelyeasytocreateyourownprocessorandcustomizetheindexchain.

TipTheexampleprojectwiththischaptercontainsseveralexamplesofUpdateRequestProcessorwithintheorg.gazzax.labs.solr.ase.ch2.urppackage.

www.it-ebooks.info

Page 104: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 105: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

IndexoperationsThissectionshowsyouthebasiccommandsneededforupdatinganindex,byaddingorremovingdocuments.Asageneralnote,eachcommandwewillseecanbeissuedinatleasttwoways:usingthecommandline,throughthecURLtool,forexample(abuilt-intoolinalotofLinuxdistributionsandavailableforallplatforms);andusingcode(thatis,SolrJorsomeotherclientAPI).Whenyouwanttoadddocuments,it’salsopossibletorunthosecommandsfromtheadministrationconsole.

NoteSolrJandclientAPIswillbecoveredlaterinadedicatedchapter.

AnothercommonaspectoftheseinteractionsistheSolrresponse,whichalwayscontainsastatusandaQTimeattribute.Thestatusisareturnedcodeoftheexecutedcommand,whichisalways0iftheoperationsucceeds.TheQTimeattributeistheelapsedtimeoftheexecution.ThisisanexampleoftheresponseinXMLformat:

<response>

<lstname="responseHeader">

<intname="status">0</int>

<intname="QTime">97</int>

</lst>

</response>

www.it-ebooks.info

Page 106: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

AddThecommandsendsoneormoredocumentstoaddtoSolr.Thedocumentsthatareaddedarenotvisibleuntilacommitoranoptimizecommandisissued.

WealreadysawthatdocumentsaretheunitofinformationinSolr.Here,dependingontheformatofthedata,oneormoredocumentsaresentusingtheproperrepresentation.

Sincetheattributesandthecontentofthemessagewillbethesameregardlessoftheformat,theformaldescriptionofthemessagestructurewillbegivenonce.ThefollowingisanaddcommandinXMLformat:

<addcommitWithin="10000"overwrite="true">

<docboost="1.9">

<fieldname="id">12020</field>

<fieldname="title"boost="2.2">Roundaroundmidnight</field>

<fieldname="subject">Music</field>

<fieldname="subject">Jazz</field>

</doc>

</add>

Let’sdiscusstheprecedingcommandindetail:

<add>:ThisistheroottagoftheXMLdocumentandindicatestheoperation.commitWithin:Thisisanalternativetotheautocommitfeatureswesawpreviously.Usingthisoptionalattribute,therequestorasksSolrtoensurethatthedocumentswillbecommittedwithinagivenperiodoftime.overwrite:ThistellsSolrtocheckoutandeventuallyoverwritedocumentswiththesameuniqueKey.Ifyoudon’thaveauniqueKey,oryou’reconfidentthatyouwon’teveraddthesamedocumenttwice,youcangetsomeindexperformanceimprovementsbyexplicitlysettingthisflagtofalse.<doc>:Thisrepresentthedocumenttobeadded.boost:Thisisanoptionalattributethatspecifiestheboostforthewholedocument(thatis,foreachfield).Itdefaultsto1.0.<field>:Thisisafieldofthedocumentwithjustonevalue.Ifthefieldismultivalued,therewillbeseveralfieldswiththesamenameanddifferentvalues.boost:Thisisanoptionalattributethatspecifiestheboostforthespecificfield.Itdefaultsto1.0.

ThesamedatacanbeexpressedinJSONasfollows:

{

"add":{

"commitWithin":10000,

"overwrite":true,

"doc":{

"boost":1.9,

"id":12020,

"title":{

"value":"Roundaroundmidnight",

"boost":2.2

www.it-ebooks.info

Page 107: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

},

"subject":["Music","Jazz"]

}

}

}

Asyoucansee,theinformationisthesameasinthepreviousexample.ThedifferenceisintheencodingoftheinformationaccordingtotheJSONformat.

SendingaddcommandsWecanissueanaddcommandinseveralways:usingcURL,theadministrationconsole,andaclientAPIsuchasSolrJ.

ThecURLtoolisacommand-linetoolusedtotransferdatawithURLsyntax.Amongotherprotocols,itsupportsHTTPandHTTPS,soit’sperfectforsendingcommandstoSolr.ThesearesomeexamplesofaddcommandssentusingcURL:

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

[email protected]

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary

'<addcommitWithin="10000"overwrite="true">

<docboost="1.9">

<fieldname="id">12020</field>

<fieldname="subject">Jazz</field>

</doc>

</add>'

Thefirstexampleusesdatacontainedinafile.Thesecond(usefulforshortrequests)directlyembedsthedocumentsinthedata-binaryparameter.TheprecedingexamplesareperfectlyvalidforJSONandCSVdocumentsaswell(obviously,thedataformatandthecontenttypewillchange).

www.it-ebooks.info

Page 108: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DeleteAdeletecommandwillmarkoneormoredocumentsasdeleted.Thismeansthetargetdocumentsarenotimmediatelyremovedfromtheindex.Instead,akindoftombstoneisplacedonthem;whenthenextcommiteventhappens,thatdatawillberemoved.Commitsandoptimizesarecommandsthatmaketheupdatechangesvisibleandavailable.Inotherwords,theymakethosechangeseffectivelypartoftheSolrindex.Wewillseebothofthemlater.

Solrallowsustoidentifythetargetdocumentsintwodifferentways:byspecifyingasetofidentifiersorbydeletingalldocumentsmatchedbyaquery.Inthesamewayaswesentaddcommands,wecanusecURLtoissuedeletecommands:

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary@datafile_with_deletes.xml

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary

'<delete>

<id>92392</id>

<query>publisher:"Ashler"</query>

</delete>'

Inthesecondexample,weissuedacommandtodelete:

Thedocumentwith92392asuniqueKeyAlldocumentsthathaveapublisherattributewiththeAshlervalue

www.it-ebooks.info

Page 109: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Commit,optimize,androllbackChangesresultingfromaddanddeleteoperationsarenotimmediatelyvisible.Theymustbecommittedfirst;thatis,acommitcommandhastobesent.

WealreadyexploredhardandsoftunsolicitedcommitsintheIndexconfigurationsection.ThesamecommandcanbeexplicitlysenttoSolrbyclients.

Althoughwepreviouslydescribedthedifferencebetweenhardandsoftcommits,it’simportanttorememberthatahardcommitisanexpensiveoperation,causingchangestobepermanentlyflushedtodisk.Softcommitsoperateexclusivelyinmemory,andarethereforeveryfastbuttransient;so,intheeventofaJVMcrash,softlycommitteddataislost.

TipInaprototypeI’mworkingon,weindexdatacomingfromtrafficsensorsinSolr.Asyoucanimagine,theinputflowiscontinuous;itcanhappenseveraltimesinasecond.Acontrolsystemneedstoexecuteagivensetofqueriesatshortperiodicintervals,forexample,everyfewseconds.Inordertomakethemostupdateddataavailabletothatsystem,weissueasoftcommiteverysecondandahardcommitevery20minutes.Atthemoment,thisseemstobeagoodcompromisebetweentheavailabilityoffreshdataandtheriskofdataloss(itcouldstillhappenduringthose20minutes).

Forthoseinterested,theSolrextensionwewilluseinthatprojectisavailableonGitHub,athttps://github.com/agazzarini/SolRDF.ItallowsSolrtoindexRDFdata,anditisagoodexampleofthecapabilitiesofSolrintherealmofcustomization.

Athirdkindofcommit,whichisactuallyahardcommit,istheso-calledoptimize.Withoptimize,otherthanproducingthesameresultsasthoseofahardcommit,Solrwillmergethecurrentindexsegmentsintoasinglesegment,resultinginasetofintensiveI/Ooperations.Themergeusuallyoccursinthebackgroundandiscontrolledbyparameterssuchasmergescheduler,mergepolicy,andmergefactor.Likethehardcommit,optimizeisaveryexpensiveoperationintermsofI/Obecause,apartfromcostingthesameasahardcommit,itmusthavesometemporaryspaceavailableonthedisktoperformthemerge.

Itispossibletosendthecommitortheoptimizecommandtogetherwiththedatatobeindexed:

#curlhttp://127.0.0.1:8983/solr/update?commit=true-H"Content-type:

text/xml"[email protected]

#curlhttp://127.0.0.1:8983/solr/update?optimize=true-H"Content-type:

text/xml"[email protected]

Themessagepayloadcanalsobeacommitcommand:

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary'<commit/>'

AcommithasafewadditionalBooleanparametersthatcanbespecifiedtocustomizethe

www.it-ebooks.info

Page 110: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

servicebehavior:

Parameter Description

waitSearcher Thecommandwon’treturnuntilanewsearcherisopenedandregisteredasthemainsearcher

waitFlush Thecommandwon’treturnuntiluncommittedchangesareflushedtodisk

softCommit Ifthisistrue,asoftcommitwillbeexecuted

Beforecommittinganypendingchange,it’spossibletoissuearollbacktoremoveuncommittedaddanddeleteoperations.Thefollowingareexamplesofrollbackrequests:

#curlhttp://127.0.0.1:8983/solr/update?rollback=true

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary'<rollback/>'

www.it-ebooks.info

Page 111: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 112: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ExtendingandcustomizingtheindexprocessAswesawbefore,theSolrindexchainishighlycustomizableatdifferentpoints.Thissectionwillgiveyousomehintsandexamplestocreateyourownextensioninordertocustomizetheindexingphase.

www.it-ebooks.info

Page 113: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ChangingthestoredvalueoffieldsOneofthemostfrequentneedsthatIencounterwhileI’mindexingbibliographicdataistocorrectorchangetheheadings(labels)belongingtotheincomingrecords(documents).

NoteThishasnothingtodowiththetextanalysiswehavepreviouslyseen.Here,wearedealingwithunwanted(wrong)values,diacriticsthatneedtobereplaced,oringeneral,labelsintheoriginalrecordthatwewanttochangeandshowtotheendusers.InSolrterms,wewanttochangethestoredvalueofafieldbeforeitgetsindexed.

SupposealibraryhasalotofrecordsandwantstopublishtheminanOPAC.Unfortunately,manyofthoserecordshavetitleswithatrailingunderscore,whichhasaspecialmeaningforlibrarians.Whilethisisnotaproblemforthecataloguingsoftware(becauselibrariansareawareofthatconvention),itisnotacceptabletoendusers,anditwillsurelybeseenasatypo.Soifwehaverecordswithtitlessuchas“Agoodoldstory_”or“Thisisanothertitle_”inourapplication,wewanttoshow“Agoodoldstory”and“Thisisanothertitle”withoutunderscoreswhentheusersearchesforthoserecords.

Rememberthatanalyzersandtokenizersdeclaredinyourschemaonlyactontheindexedvalueofagivenfield.Thestoredvalueiscopiedverbatimasitarrives,sothere’snochancetomodifyitonceitisindexed.

Inthesecases,anUpdateRequestProcessorperfectlyfitsourneeds.TheexampleprojectassociatedwiththischaptercontainsseveralexamplesofcustomUpdateRequestProcessors.Here,weareinterestedinRemoveTrailingUnderscoreProcessor,whichcanbefoundinthesrc/main/javawithintheorg.gazzax.labs.solr.ase.chr.urppackage.

Asyoucansee,writinganUpdateRequestProcessorrequirestwoclassestobeimplemented:

Factory:Aclassthatextendsorg.apache.solr.update.processor.UpdateRequestProcessorFactory

Processor:Aclassthatextendsorg.apache.solr.update.processor.UpdateRequestProcessor

Thefirstisafactorythatcreatesconcreteinstancesofyourprocessorandcanbeconfiguredwithasetofcustomparametersinsolrconfig.xml:

<processorclass="org.gazzax.labs.solr.ase.chr.urp.

RemoveTrailingUnderscoreProcessorFactory">

<arrname="fields">

<strname="fields">title</str>

<strname="fields">author</str>

</arr>

</processor>

Inthiscase,insteadofhardcodingthenameofthefieldsthatwewanttocheck,wedefineanarrayparametercalledfields.Thatparameterisretrievedinthefactory,specificallyin

www.it-ebooks.info

Page 114: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

theinit()method,whichwillbecalledbySolrwhenthefactoryisinstantiated:

privateString[]fields;

@Override

publicvoidinit(NamedListargs){

SolrParamsparameters=SolrParams.toSolrParams(args);

this.fields=parameters.getParams("fields");

}

TheotherrelevantsectionofthefactoryisinthegetInstancemethod,whereanewinstanceoftheprocessoriscreated:

@Override

publicvoidgetInstance(SolrQueryRequestreq,SolrQueryReponseres,

UpdateRequestProcessornext){

returnnewRemoveTrailingUpdateRequestProcessor(next,fields);

}

Anewprocessorinstanceiscreatedwiththenextprocessorinthechainandthelistoftargetfieldsweconfigured.Nowtheprocessorreceivesthoseparametersandcanadditscontributiontotheindexphase.Inthiscase,wewanttoputsomelogicbeforetheaddphase:

@Override

publicvoidprocessAdd(finalAddUpdateCommandcommand){

//1.RetrievetheSolr(Input)Document

SolrInputDocumentdocument=command.getSolrInputDocument();

//2.Loopthorughtargetfields

for(Stringname:fields){

//3.Getthefieldvalue

//weassumetargetfieldsaremonovaluedforsimplicity

Stringvalue=document.getFieldValue(name);

//4.Checkandeventuallychangethevalue

if(value!=null&&value.endsWith("_")){

StringnewValue=value.substring(0,value.length()-1);

document.setFieldValue(name,newValue);

}

}

//5.IMPORTANT:forwardtothenextprocessorinthechain

super.processAdd(command);

}

TipYoucanfindthesourcecodeofthewholeexampleundertheorg.gazzax.labs.solr.ase.ch2.urppackageofthesourcefolderintheprojectassociatedwiththischapter.ThepackagecontainsadditionalexamplesofUpdateRequestProcessor.

www.it-ebooks.info

Page 115: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

IndexingcustomdataThedefaultUpdateRequestHandlerisverypowerfulbecauseitcoversthemostpopularformatsofdata.However,therearesomecaseswheredataisavailableinalegacyformat.Hence,weneedtodosomethinginordertohaveSolrworkingwiththat.

Inthisexample,Iwilluseaflatfile,thatis,asimpletextfilethattypicallydescribesrecordswithfieldsofdatadefinedbyfixedpositions.TheyareverypopularinintegrationprojectsbetweenbanksandERPsystems(justtogiveyouaconcretecontext).

TipIntheexampleprojectassociatedwiththischapter,youcanfindanexampleofsuchafiledescribingbooksunderthesrc/solr/solr-homes/flatIndexer/example-input-datafolder.

Here,eachlinehasafixedlengthof107charactersandrepresentsabook,withthefollowingformat:

Parameter Position

Id 0to8

ISBN 8to22

Title 22to67

Author 67to106

Therearetwoapproachesinthisscenario:thefirstmovestheresponsibilityontheclientside,thuscreatingacustomindexerclientthatgetsthedatainanyformatandcarriesoutsomemanipulationtoconvertitintooneofthesupportedformats.Wewon’tcoverthisscenariorightnow,aswewilldiscussclientAPIsinanextchapter.

AnotherapproachcouldbeacustomextensionoftheUpdateRequestHandler.Inthiscase,wewanttohaveanewcontenttype(text/plain)andacorrespondingcustomhandlertoloadthatkindofdata.Therearetwothingsweneedtoimplement.ThefirstisasubclassoftheexistingUpdateRequestHandler:

publicclassFlatDataUpdateextendsUpdateRequestHandler{

@Override

protectedMap<String,ContentStreamLoader>createDefaultLoaders(NamedList

n){

Map<String,ContentStreamLoader>registry=newHashMap<String,

ContentStreamLoader>();

registry.put("text/plain",newFlatDataLoader());

returnregistry;

}

}

Here,wearesimplyoverridingthecontenttyperegistry(theregistryinthesuperclasscannotbemodified)toaddourcontenttype,withacorrespondinghandlercalled

www.it-ebooks.info

Page 116: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

FlatDataLoader.ThisclassextendsContentStreamLoaderandimplementstheparsinglogicoftheflatdata:

publicclassFlatDataLoaderextendsContentStreamLoader

Thecustomloadermustprovideaload(…)methodtoimplementthestreamparsinglogic:

@Override

publicvoidload(

SolrQueryRequestreq,

SolrQueryResponsersp,

ContentStreamstream,

UpdateRequestProcessorprocessor)throwsException{

//1.getareaderassociatedwiththecontentstreamBufferedReader

reader=null;

try{

reader=newBufferedReader(stream.getReader());

StringactLine=null;

while((actLine=reader.readLine())!=null){

//2.Sanitycheck:checklinelength

if(actLine.length()!=107){

continue;

}

//3.parseandcreatethedocument

SolrInputDocumentdoc=newSolrInputDocument();

doc.setField("id",actLine.substring(0,8));

doc.setField("isbn",actLine.substring(8,22));

doc.setField("title",actLine.substring(22,67));

doc.setField("author",actLine.substring(67));

AddUpdateCommandcommand=getAddCommand(req);

command.solrDoc=document;

processor.processAdd(command);

}finally{

//Closethereader

}

}

Ifyouwanttoviewthisexample,justopenthecommandlineinthefolderoftheprojectassociatedwiththischapter,andrunthefollowingcommand:

#mvncargo:run–PflatIndexer

TipYoucandothesamewithEclipsebycreatinganewMavenlaunchaspreviouslydescribed.Inthatcase,youwillalsobeabletoputdebugbreakpointsinthesourcecode(yoursourcecodeandtheSolrsourcecode)andproceedstepbystepintheSolrindexprocess.

OnceSolrhasstarted,openanothershell,changethedirectorytogototheprojectfolder,andrunthefollowingcommand:

www.it-ebooks.info

Page 117: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

#curlhttp://127.0.0.1:8983/solr/flatIndexer/update?commit=true-H

"Content-type:text/plain"--data-binary@src/solr/solr-

homes/flatIndexer/example-input-data/books.flat

Youshouldseesomethinglikethisintheconsole:

[UpdateHandler]start

commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=f

alse,softCommit=false,prepareCommit=false}

[SolrCore]SolrDeletionPolicy.onCommit:commits:num=2

[SolrCore]newestcommitgeneration=4

[SolrIndexSearcher]OpeningSearcher@77ee04bb[flatIndexer]main

[UpdateHandler]end_commit_flush

Nowopentheadministrationconsoleathttp://127.0.0.1:8983/solr/#/flatIndexer/query,andclickontheExecuteQuerybutton.Youshouldseethreedocumentsontherightpane.

TipYoucanfindthesourcecodeoftheentireexampleundertheorg.gazzax.labs.solr.ase.ch2.handlerpackageofthesourcefolderintheprojectassociatedwiththischapter.

www.it-ebooks.info

Page 118: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 119: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TroubleshootingThissectionprovidessuggestionsandtipsonhowtoresolvesomecommonproblemsencounteredwhendealingwithindexingoperations.

www.it-ebooks.info

Page 120: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MultivaluedfieldsandthecopyFielddirectiveThecardinalityofafieldcanbetricky,especiallywhenusedinconjunctionwithcopyFielddirectives,wheretwoormoresingle-valuedfieldsarecopiedtoanotherfield,likethis:

<fieldname="author_person"…required="true"/>

<fieldname="author_corporate"…required="true"/>

<fieldname="author_search"…multiValued="true"/>

<copyFieldsource="author_person"dest="author_search"/>

<copyFieldsource="author_corporate"dest="author_search"/>

Inthiscase,thedestinationfieldmustbemultivalued.Otherwise,therewillbetwovaluesfortwodifferentsourcefields,andSolrwillrefusetoindexthewholedocument,showingERRORmultiplevaluesencounteredfornonmultiValuedfieldauthor_search.

www.it-ebooks.info

Page 121: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ThecopyFieldinputvalueAcommonmisunderstandingwiththecopyFielddirectiveisrelatedtothevaluethatisbeingcopiedfromthesourcetothedestfield.SupposeyoudefinefieldA,fieldB,andacopyFielddirectivefromAtoB:

<fieldname="A"type="text_without_stopwords"…/>

<fieldname="B"type="light_stemmed_text"…/>

<copyFieldsource="A"dest="B"/>

IrrespectiveofthetextanalysiswedefinedforfieldAandfieldB.FieldBwillgetthestoredvalueoffieldA,withoutanytextanalysisapplied.Inotherwords,theincomingvalueforthefieldAiscopiedverbatimtofieldBbeforeanyanalysistextcanbeassociatedwiththatfield.

So,ifwehaveavalueof“oneandtwo”forfieldA,“and”isconsideredasastopword.The“oneandtwo”valueisinjectedintofieldA,whichwilltriggerthetextanalysisforthetext_without_stopwordstype,thereforeresultinginanindexedvalue(forfieldA)composedoftwotokens:“one”,“two”(“and”hasbeenremoved).

Next,thevalueoriginalvalueoffieldA(“oneandtwo”)iscopiedtofieldB,triggeringthetextanalysisassociatedwiththatfield.

www.it-ebooks.info

Page 122: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

RequiredfieldsandthecopyFielddirectiveArequiredattributeonastaticfielddenotesthatanincomingdocumentmustcontainavalidvalueforthatfield.IfafieldisthetargetordestinationofacopyFielddirectivetherequiredattributemeansthatinsomeway,thereshouldbeavalueforthatfieldcomingfromitssources.Seethefollowingexample:

<fieldname="A"…required="false"/>

<fieldname="B"…required="false"/>

<fieldname="C"…required="true"multiValued="true"/>

<copyFieldsrc="A"dest="C"/>

<copyFieldsrc="B"dest="C"/>

FieldsAandBarenotrequiredandtheyarecopiedinfieldC.SincethefieldCismandatory,youhavetomakesurethat,foreachinputdocument,atleastAorBwillhaveavalidvalue,otherwiseSolrwillcomplainaboutamissingvalueforfieldC.

www.it-ebooks.info

Page 123: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Storedtextisimmutable!AstoredfieldvalueisthetextthatcomesfromtheSolr(Input)document.Itwillbecopiedverbatimbecauseitarriveswithoutanychanges.Anytextanalysisconfiguredintheschemaforagivenfieldtypewon’taffectthatvalue.

Inotherwords,thestoredvaluewon’tbechangedatallbySolrduringtheindexphase.

www.it-ebooks.info

Page 124: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DatanotindexedThedesignofUpdateRequestProcessorfollowsthedecoratorpattern,consistingofanestedchainofresponsibilitywhereeachringisexecutedoneaftertheother.YourcustomUpdateRequestProcessorwillgetareferencetothenextprocessorinthechainduringitslifecycle.Onceitsworkhasbeendone,itiscrucialtoforwardtheexecutionflowtothenextprocessor.Otherwise,thechainwillbeinterruptedandnodatawillbeindexed.

www.it-ebooks.info

Page 125: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 126: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthischapter,wesawthemainconceptsoftheindexingphaseinSolr.Beinganinverted-index-basedsearchengine,Solrstronglyreliesontheindexingphasebyallowingacustomizableandtunableindexchain.

TheSolrwritepathisachainofresponsibilityconsistingofseveralactors,eachofthemwithapreciseroleintheoverallprocess.Whileyoumustknow,configure,andcontrolthosecomponentsasauser,youmustalsobeawareoftheirhighlevelofextensibility(asadeveloper).ThisallowsyoutoadaptandeventuallycustomizeaSolrinstanceaccordingtoyourspecificneeds.

WeaddressedtheconceptsthatformtheSolrdatamodel,suchasdocuments,core,schema,fields,andtypes.Wealsolookedattheindexingconfigurationandtheinvolvedcomponentssuchasupdaterequestprocessors,updatechains,andrequesthandlers.Wefinallydescribedhowtoconfigurethesecomponentsandwriteextensionsontopofthem.

Thepurposeoftheindexingphaseandtheindexitselfistooptimizespeedandperformanceinfindingrelevantdocumentsduringsearches.Hence,thewholeprocessisnotusefulwithoutthesearchphase,whichisthesubjectofthenextchapter.

www.it-ebooks.info

Page 127: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 128: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter3.SearchingYourDataOncedatahasbeenproperlyindexed,it’sdefinitelytimetosearch!Theindexingphasemakesnosenseifthingsendthere.Dataisindexedmainlytospeedupandfacilitatesearches.

ThischapterfocusesonsearchcapabilitiesofferedbySolrandillustratestheseveralcomponentsthatcontributetoitsreadpath.

Thechapterwillcoverthefollowingtopics:

QueryingSearchconfigurationTheSolrreadpath:queryparsers,searchcomponents,requesthandlers,andresponsewritersExtendingSolrTroubleshooting

www.it-ebooks.info

Page 129: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ThesampleprojectThroughoutthischapter,wewilluseasampleSolrinstancewithaconfigurationthatincludesallthetopicswewillgraduallydescribe.Thisinstancewillhaveasetofsimpledocumentsrepresentingmusicalbums.Thesearethefirstthreedocuments:

<doc>

<fieldname="id">1</field>

<fieldname="title">AModernJazzSymposiumofMusicandPoetry</field>

<fieldname="composer">CharlesMingus</field>

</doc>

<doc>

<fieldname="id">2</field>

<fieldname="title">WhereJazzmeetsPoetry</field>

<fieldname="artist">RaphaelAustin</field>

</doc>

<doc>

<fieldname="id">3</field>

<fieldname="title">I'mInTheMoodForLove</field>

<fieldname="composer">CharlieParker</field>

<fieldname="genre">Jazz</field>

</doc>

ThesourcecodeofthesampleprojectassociatedwiththischaptercontainstheentireMavenproject,whichcanbeeitherloadedinEclipseorusedviathecommandline.Asapreliminarystep,openashell(orrunthefollowingcommandwithinEclipse)intheprojectfolderandtypethis:

#mvncleancargo:run–Pquerying

TheprecedingcommandwillstartanewSolrinstance,withsampledatapreloaded.

TipThesampledataisautomaticallyloadedatstartupbymeansofacustomSolrEventListener.Youcanfindthesourcecodeundertheorg.gazzax.labs.solr.ase.ch3.listenerpackage.

Youcanusethepagelocatedathttp://127.0.0.1:8983/solr/#/example/querytotryandexperimentbyyourselftheseveralthingswewilldiscuss.

TipIfyouloadedtheprojectinEclipse,under/src/dev/eclipseyouwillfindthelaunchconfigurationusedtostartSolr.

www.it-ebooks.info

Page 130: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 131: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

QueryingSolrcanbeseenasatell-and-asksystem;thatis,youfirstputin(index)somedata,thenitcananswerquestionsyouask(query)aboutthatdata.Sincetheactorsinvolvedintheseinteractionsarenothumans,Solrprovidesaformalandsystematicwaytoexecutebothindexandqueryoperations.Specifically,fromaqueryperspective,thatrequiresaspecializedlanguagethatcanbeinterpretedbySolrinordertoproducetheexpectedanswers.Suchalanguageisusuallycalledaquerylanguage.

www.it-ebooks.info

Page 132: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Search-relatedconfigurationThesolrconfig.xmlfilehasa<query>sectionthatcontainsseveralsearchsettings.Mostofthemarerelatedtocaches,acriticaltopicthatwillbedescribedinChapter5,AdministeringandTuningSolr.

Aswealreadysaidfortheindexsection,allthoseparametershavegooddefaultsthatworkwellinalotofscenarios.Thislistdescribestherelevantsettings(cachesettingsarenotincluded):

Searcherlifecyclelisteners:Wheneverasearcherisopened,it’spossibletoconfigureoneormorequeriesthatwillbeautomaticallyexecutedinordertoprepopulatecaches.Usecoldsearcher:Ifasearchisissuedandthereisn’taregisteredsearcher,thecurrentwarmingsearcherisimmediatelyused.Ifthisattributeissettofalse,theincomingrequestwillwaituntilthewarmingcompletes.Maxwarmingsearchers:Thisisthemaximumnumberofsearchersthatarewarminginparallel.Theexampleconfigurationcontainsavalueof2,whichisgoodforpuresearcherinstances.Forindexers(whichcouldbealsosearchers),ahighervaluecouldbeneeded.

www.it-ebooks.info

Page 133: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

QueryanalyzersInthepreviouschapter,wediscussedanalyzers.Theirmeaninghereisthesame,andthedifferenceresidesonlyintheirinputvalue.Whenweindexdata,thatvalueisthecontentofthefieldsthatmakeuptheinputdocuments.Atquerytime,theanalyzerprocessesavalue,term,orphrasecomingfromaqueryparserandrepresentingacompoundingpieceoftheuser-enteredquery.

TipInthepreviouschapter,weusedtheanalysispagetoseehowtextanalysisworksatindextime.Thatverypagehasanadditionalsectionthatcanbeusedtoseethesameprocessbutusingthequeryanalyzer.

www.it-ebooks.info

Page 134: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CommonqueryparametersAquerytoSolr,otherthanasearchstring,includesseveralparametersthatarepassedusingstandardHTTPprocedures,thatis,name/valuepairsinthequerystring,likethis:http://127.0.0.1:8080/solr/ch3/search?q=history&start=10&rows=10&sort=title

asc

Whilesomeofthemstrictlydependonthecomponentthatwillbeinchargeofhandlingtherequest,therearesetsofcommonparameters.Thefollowingtabledescribesthem:

Parameter Description

q ThesearchstringthatindicateswhatweareaskingtoSolraccordingtoagivensyntax.

start Thestartoffsetwithinsearchresults.Thisisusedtopaginatesearchresults.

rows Themaximumsize(thatis,numberofdocuments)ofthereturnedpage.

sortAcomma-separatedlistof(indexed)fieldsthatwillbeusedtosortsearchresults.Eachfieldmustbefollowedbythekeywordasc(forascendingorder)ordesc(descendingorder).

defTypeIndicatesthequeryparserthatwillinterpretthespecificsearchstring.Eachqueryparserhasdifferentfeaturesanddifferentrulesandacceptsadifferentsyntaxinqueries.

fl Acomma-orspace-separatedlistoffieldsthatwillbereturnedaspartofthematcheddocuments.

fq Afilterquery.Theparametercanberepeated.

wt Theresponseoutputwriterthatwilldeterminetheresponseoutputformat.

debugQueryIfthisistrue,anadditionalsectionwillbeappendedtotheresponsewithanexplanationofthecurrentreadpath.

explainOther

Theuniquekeyofadocumentthatisnotpartofsearchresultsforagivenquery.Solrwilladdasectiontotheresponseexplainingwhythedocumentassociatedwiththatidentifierhasbeenexcludedfromsearchresults.

timeAllowedAconstraintonthemaximumamountoftimeallowedforqueryexecution.Ifthetimeoutexpires,Solrwillreturnonlypartialresults.

cache Enablesordisablesquerycaching.

omitHeader

Bydefault,theresponsecontainsaninformationheaderthatcontainssomemetadataaboutthequeryexecution(forexample,inputparametersorqueryexecutiontime).Ifthisparameterissettotrue,thentheheaderisomittedintheresponse.

Thefollowingaresomeexamplesqueries:http://localhost:8983/solr/example/query?

q=charles&fq=genre:jazz&rows=5&omitHeader=tue&debugQuery=true

http://localhost:8983/solr/example/query?

q=charles&rows=10&omitHeader=tue&debugQuery=true&explainOther=2

http://localhost:8983/solr/example/query?q=*:*&start=5&rows=5

www.it-ebooks.info

Page 135: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Asyoucanimagine,theqparameter,whichcontainsthequery,willbeveryimportantinthischapter.Besidesthis,therearetwootherparameters—fl(fieldlist)andfq(filterqueries)—thatwillbedescribedinthenextsections,becausetheyhavesomeinterestingaspects.

FieldlistsTheflparameterindicateswhichfields(amongfieldsthathavebeenmarkedasstored)willbereturnedindocumentswithinaqueryresponse.Thinkofthesetwoscenarios:

Aschemathatcontainsalotoffields,probablydefiningmultipleentities(thatis,booksandauthors).I’mlookingforbookssoIdon’twanttoseeanyauthorattributes(andviceversa).Aschemathatcontainsstoredfieldswithalotoftext,usedforthehighlightingcomponent,forexample(itrequiresthathighlightsnippetscomefromastoredfield).WhenIexecutequeriesIdon’twantthosefieldstobereturnedaspartofthematchingdocuments.Inotherwords:Iwanttoexcludethosefieldsfromsearchresults.

Theflparameterspecifiesthelistoffieldsthatwillcompoundeachmatcheddocument,thusfilteringoutunwantedattributes.Theparameteracceptsaspace-orcomma-separatedlistofvalues,whereeachvaluecanbeanyofthefollowing:

Afieldname(forexample,title,artist,released,andsoon).Theliteralscore,whichisavirtualfieldindicatingthecomputedscoreforeachdocument.Aglob,whichisanexpressionthatdynamicallymatchesoneormorefieldsbymeansofthe*and?wildcardcharacters(forexample,art*,r?leas?d,andre?leas*).Theasterisk(*)character,whichmatchesallavailable(thatis,stored)fields.Afunctionthat,whenevaluated,willproduceavalueforavirtualfieldthatwillbeaddedtodocuments.Atransformer.Likeafunction,thisisanotherwaytocreatevirtualfieldsindocuments,withadditionaldatasuchastheLucenedocumentID,shardidentifier,orthequeryexecutionexplanation.

Explicitfields,score,functions,andtransformerscanbealiasedbyprefixingthemwithanamethatwillbeusedinplaceoftherealnameofthatmember.

TipSOLR-3191trackstheactivityrelatedtoaso-calledfieldexclusionfeature.Oncethispatchhasbeenapplied,itwillbepossibletoexplicitlyindicatewhichfieldsmustnotbepartofthereturneddocuments.

Thefollowingtablelistssomeexamplesoftheflparameter:

Example Description

*,score Allstoredfieldsandthescorevirtualfield

www.it-ebooks.info

Page 136: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

t*,*d Allfieldsstartingwithtandendingwithd

max(old_price,new_price) Maximumvaluebetweenold_priceandnew_price

max_price:max(p1,p2) Afunctionalias

title,t_alias:title,[docid] Title,aliasedtitle,andatransformer

Thedifferencebetweenthethirdandfourthexamplesintheprecedingtableisinthenameofthefieldthatwillholdthefunctionvalue.Inthefirstcase,itwillbethefunctionitself;intheother,itwillbeavirtualfieldcalledmax_price.

TipWiththesampleinstancerunning,youcantrytheseexamplesbyissuingarequestsuchashttp://127.0.0.1:8983/solr/example/query?q=id:1&fl=,replacingthevalueoftheflparameter.

Acompletelistofavailablefunctionscanbeaccessedathttp://wiki.apache.org/solr/FunctionQuery#Available_Functions.

Acompletelistofavailabletransformerscanbereadathttps://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents.

FilterqueriesFilterqueriesoperateakindofintersectionontopofdocuments,resultingfromtheexecutionofthemainquery.Afilterqueryislikehavingarequiredconditioninyourmainquery(thatis,anadditionalclauseconcatenatedwiththeANDoperator),butwithsomeimportantdifferences:

ItisexecutedseparatelyandbeforethemainqueryThefilterandtheintersectionareappliedontopofthemainqueryresultsItdoesn’tinfluencethescoreofthedocuments,whichiscomputedintheexecutionofthemainqueryTheresultsoffilterqueriesarecachedseparatelysothattheycanbereusedforfurtherexecutions

Therecanbemorethanonefqparameterinasearchquery.Inthiscase,theresultoftheoverallexecutionwilltakeintoaccountallfilterclauses,thereforeresultingindocumentsthatsatisfytheintersectionbetweenthemainresultsandtheresultsofeachfilterquery.

FilterquerycachingisoneofthemostcrucialfeaturesofSolr.Afilterquery’sdesignshouldreflecttheaccesspatternofrequestorsasmuchaspossible.Considerthisfilterquery:

fq=genre:JazzANDreleased:1981

Theprecedingquerywillcachetheresultsofthosetwoclausestogether.So,ifyourapplicationprovidestwoseparatefilters(fortheendusers),genreandreleased,thefollowingfilterquerieswon’tbenefitfromthiscache,andtheywillbecached(again)separately:

www.it-ebooks.info

Page 137: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

fq=genre:Jazz

fq=released:1981

Inthissituation,thefirstqueryshouldberewritteninthefollowingway,allowingreuseofthecacheassociatedwitheachfilterquery:

fq=genre:Jazz&fq=released:1981

www.it-ebooks.info

Page 138: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 139: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

QueryparsersAqueryparserisacomponentresponsiblefortranslatingasearchstringorexpressionintospecificinstructionsforSolr.Everyqueryparserunderstandsagivensyntaxforexpressingqueries.

Solrcomeswithseveralqueryparsers,givingtherequestorsawiderangeofwaysofaskingwhattheyneed.

www.it-ebooks.info

Page 140: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheSolrqueryparserTheSolrqueryparser,oftenmistakenlycalledLucenequeryparser,isimplementedinorg.apache.solr.search.LuceneQParserPlugin.Itisratheraschema-drivensupersetofthedefaultLucenequeryparser.

NoteNotethePluginsuffixoftheclassname.Solrprovidesanextensibleframeworkforcreatingandplugginginyourownqueryparser.

Thefollowingsectionswilldescribetherelevantaspectsofthisparser.

Terms,fields,andoperatorsYou’vealreadymetterms.Theyareatomicunitsofinformationresultingfromananalysisappliedtogiventext.Atindextime,thattextisthevalueofafieldbelongingtoagiven(input)document.Atquerytime,termscomefromtheuser-enteredquerystring.Specifically,aquerystringisbrokenintoterms,fields,andoperators.

Termscanbesimpleorcompoundterms;forexample,theycanbesinglewordssuchasCM,Standard,and1959orphrasessuchas“GoodbyePorkPieHat.”Phrasesaretwoormorewordssurroundedbydoublequotes.

Fieldsarewhatwedeclaredintheschema.xmlfile.Theirusewithinasearchstringallowsarequestortoexpressinstructionssuchas“searchxinfieldy”wherexisatermoraphraseandyisthefieldname.Herearesomeexamplesoftheuseoffields:

title:"WhereJazzmeetsPoetry"

composer:Mingus

Operatorsarekeywordsorsymbolsusedasconjunctionsbetweenseveralfield-valuecriteriainordertocreatecomplexexpressions,suchasthis:

title:JazzORcomposer:CharlieANDreleased:1959

genre:JazzANDNOTreleased:1959

Thefollowingtabledescribestheavailableoperators:

Operator Description

AND Aconjunctionbetweentwocriteria,bothofwhichmustbesatisfied

OR Aconjunctionbetweentwocriteriawhereatleastonemustbesatisfied

+ Marksatermasrequired

-/NOT Marksatermasprohibited

It’salsopossibletouseapairofparenthesestogroupseveralfieldsorvaluescriteria,likethis:

(released:1957ANDcomposer:Mingus)OR(released:1976ANDNOTgenre:Jazz)

www.it-ebooks.info

Page 141: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ORreleased:(1988OR1959)

BoostsBoostingallowsyoutocontroltherelevanceofagivenmatchingdocument,thusofferingawaytogivetosomequeryresultsmoreimportancethanothers;forexample,ifyouaremainlyinterestedinJazzandlessinFusionalbums,youcouldusethis:

+genre:Fusion+genre:Jazz^2

Theboostfactorisinsertedafterafieldvaluecriterionandprefixedwithacaretsymbol.Ithastobegreaterthan0,andsinceitisafactor,avaluebetween0and1representsanegativeboost.Ifitisabsent,adefaultboostfactorof1willbeapplied.

WildcardsThewildcardcharacters,*and?,canbeusedwithinterms,withzeroormoreoccurrences.Theycannotbeappliedtocompoundterms(thatis,searchphrases)ornumericanddatetypes.The?wildcardmatchesasinglecharacter,whilethe*matcheszeroormoresequentialcharacters.Herearesomeexamplesofwildcards:

(title:moder*ANDartist:Min*)ORartist:(Yngw?eANDM?lm*)

FuzzyThetildesymbol(~)attheendofatermenablesaso-calledfuzzyquery,allowingyoutomatchtermsthataresimilartothatterm.FuzzylogicisbasedontheDamerau-Levenshteindistancealgorithm.Afterthetilde,youcanputavaluebetween0and2,indicatingtherequiredsimilarity(2meanshighsimilarityisrequired).Thedefaultvaluethatisusediftheparameterisnotgivenis0.5.

WiththeexampleSolrinstancerunning,openthequerypageintheadminconsoleandtypethefollowingquery:

artist:Charles~0.7

Thequeryresponsewillcontaintworesults.ThefirstisanalbumofCharlesMingus,thatisaperfectmatchwiththesearchtermentered.ThesecondartistisCharlieParker,whosenameissimilarbutnotequaltoCharles.

ProximityThesamesymbolthatisusedforafuzzyqueryhasadifferentmeaningwhenusedinconjunctionwithphrasequeries.Nowrunthefollowingquery:

title:"JazzPoetry"

Youwon’tgetanyresultbecausethere’snorecordwiththosetwoconsecutivetermsinthetitle.Usingatildefollowedbyanumber,whichexpressesadistancebetweenterms,youcanenableaproximitysearch,allowingmatchesofdocumentsthathavethosetwotermswithinaspecificdistancefromoneanother.

ThisquerywillmatchthedocumentthathasWhereJazzmeetsPoetryasitstitle:

www.it-ebooks.info

Page 142: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

title:"JazzPoetry"~2

ThefollowingquerywillalsomatchthedocumentthathasAModernJazzSymposiumofMusicandPoetryasthetitle:

title:"JazzPoetry"~4

RangesRangesearchesallowustospecifyforagivenfieldasetofmatchingvaluesthatfallbetweenalowerandahigherbound,inclusiveorexclusiveofthosebounds.Herearesomeexamplesofranges:

released:[1957TO1988]

released:[1957TO*]

released:[*TO1988]

released:{1957TO1988}

released:[1957TO1988}

genre:[JazzTONewAge]

Youcanseethatthelowerandhigherboundscanbeliteralvalues,asshowninthefirstexample,wherewearesearchingforalbumsreleasedbetween1957and1988.Theboundscanalsobewildcards,asshowninthesecondandthirdexamples.Squareandcurlybracketsareusedtodenoteanincludedoranexcludedbound,respectively.So,inthefirstexample,both1957and1988areincluded;inthefourthexampletheyareexcluded.

Keepinmindthat,fornon-numericfields(asshowninthefifthexampleintheprecedingcodesnippet)sortingisdonelexicographically.Therefore,asequencesuchas1,02,14,100willresultin02,1,100,14usingthelexicographicorder,whichisverydifferentfromanumericsort.

www.it-ebooks.info

Page 143: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheDisjunctionMaximumqueryparserTheSolrqueryparserispowerfulwhenitcomestobuildingcomplexexpressions.However,thosearequitefarfromwhattheuserusuallytypesinasearchfield.

ThinkabouttheGooglesearchpage.Whatdoyoutypeinthesearchtextfield?Notanexpression,butjustone,two,ormoretermsassociatedwithwhatyou’relookingfor.

TheDisjunctionMax(DisMax)queryparserdirectlyprocessesthoseuser-enteredtermsandsearchesforeachofthemacrossasetofconfigurabletargetfields,withaconfigurableweightforeachfield.

NoteTheDisMaxparserisenabledbysettingthedefTypeparametertodismax.

TheexampleSolrinstancehasarequesthandlerlisteningto/glike1thatusestheDisMaxparser.

Otherthansearchterms,thisqueryparsersupportssomefeaturesoftheSolrqueryparser,suchasquotes,thatcanbeusedtoindicatephrases,andthe+and-operandstomarkmandatoryandprohibitedterms,respectively.AllothertermmodifierswesawfortheSolrqueryparserareescaped,sotheywillbeinterpretedassearchterms.

Thenameoftheparsercomesfromitsbehavior:

Dis:Thisstandsfordisjunction,whichmeansthat,foreachwordinthequerystring,theparserbuildsanewsubqueryacrossfieldsandboostsspecifiedintheqfparameter.Theresultingqueriesaresubjectedtothefirst(required)constraintdefinedwiththemmparameter,andasetofoptionalclausesdefinedwithotherparameters,whichwewillseelater.Max:Thismeansmaximum,anditpertainstothescoringcomputation.TheDisMaxparserscoresagivendocumentbygettingthemaximumscorevalueamongallmatchingsubqueries.

Thefollowingsectionsdescribetheseveralparametersthattheparseraccepts.

QueryFieldsTheqfparameterindicatesasetoftargetfieldswiththeircorresponding(optional)boosts.Fieldsareseparatedbyspaces,andeachofthemcanhaveanoptionalboostassociatedwithit,henceresultinginexpressionssuchasthis:

qf=title^3.5artists^2.0genre^1.5released

Here,wewanttosearchacrossfourfields,eachofthemwithadifferentimportance,whichwillaffectthescoreassignedtoeachmatchingdocument.Theqfparameterisoneofthemainplaceswherewedefineoursearchstrategy,dependingoncustomerrequirements.

Tip

www.it-ebooks.info

Page 144: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

InOPACs,there’sanever-endingdebateaboutwhichisthemorerelevantattributeamongtitlesandsubjects.Atitle,asyoucanimagine,isimportant,butcouldn’tcontaintermsthatarerepresentativesofawork.Asubjectisakindofcontrolledclassificationassignedbyaprofessionaluser(thatis,alibrarian).Asasearchserviceprovider,youcanusetheqfparametertoconfigureboosts,dependingoncustomerneeds,andavoidenteringthatdebate!

TheDisMaxqueryparserhasanotherinterestingfeaturewhensearchingfieldsdeclaredintheqfparameter:whenthosefieldsarenumericordates,inappropriatetermsaredropped.Returningtotheqfexpression,considersearchingforthis:

Mingus1962

Forthetitle,artistandgenrefields,Solrwillbuildtwoqueries.Butforthereleasedfield,itwillcreatejustonequeryusingthe1962word,thusresultinginatotalof7queries:

title:Mingus^3.5,artist:Mingus^2.0,genre:Mingus^1.5,title:1962^3.5,

artist:1962^2.5,genre:1962^1.5,released:1962

Asyoucansee,thereleased:Mingusqueryhasbeendroppedbecausereleasedisanumericfield.

AlternativequeryTheq.altoptionalparameterdefinesaquerythatwillbeusedintheabsenceofthemainquery.

Theq.altqueryisparsedbydefaultusingtheSolrqueryparser,soitacceptsthesyntaxwedescribedinthepreviousparagraph.UsingLocalParams,youcanchangetheq.altparser.

MinimumshouldmatchEverywordorphrasethatisapartofthesearchstring,unlessitisconstrainedbythe+or-operators(andtherefore,markedasrequiredorprohibited),isconsideredasoptional.Forthoseoptionalparts,themmparameterdefinestheminimumnumberofmatchesthatsatisfythequeryexecution.Theinterestingpointhereisthatotherthanacceptingaquantityoranumber,thisparameteralsoallowscomplexexpressions.Thefollowingtableillustratessomeexamplesofmm:

Value Description

Aninteger(forexample,3) Atleastthegivennumberofoptionalclausesmustmatch.

Apercentage(forexample,66%) Atleastthegivenpercentageofoptionalclausesmustmatch.

Anegativenumberoranegativepercentage

Thenumberofoptionalclausesthatmustmatchistheresultofsubtractingthegivenvaluefromthetotalnumberofoptionalclauses(absoluteor100percentdependingontheparametervalue).

www.it-ebooks.info

Page 145: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OneormoreexpressionswiththeX<|>Yformat

IftherearelessthanXoptionalclauses,theymustmatch.IfclausesaregreaterthanX,thenYmustbeusedasthemmvalue.Ycanbeapositiveornegativeintegerorapercentagevalue.Itisalsopossibletoconcatenateseveralexpressions,likethis:

3<75%6<-1

Thismeansthat,withthreeoptionalclauses,allofthemarerequired.Between4and6optionalclauses,werequireamatchof75percent.Finally,formorethansixclauses,werequireamatchofallclausesbutone.

Theseveralsubqueriesresultingfromsearchtermsparsingareconstrainedwiththemmparameter(specifically,anadditionalBooleanqueryactingasaconstraintisconcatenatedwiththeANDoperator),somatchingdocumentsthatdon’tsatisfythemmconstraintwon’tbepartofthesearchresults.

PhrasefieldsOncethelistofmatchingdocumentshasbeenpopulatedaccordingtothesearchcriteriaandconstraints(forexample,mmorfilterqueries),thepfparameterraisesthescoreofdocumentsthathavesearchtermsinproximity.

Astheqfparameter,pfcandeclarealistoffieldswithanoptionalboostfactor.

QueryphraseslopTheqsparameterindicatesaproximityfactortobeusedinthosephrasequeriesthatareeventuallyincludedinthesearchstring.

PhraseslopThepsparameterindicatesaproximityfactortobeusedinphrasequeriesbuiltforpffields.Notethatsuchquerieswillbeexecutedonlytoboostresults(seetheprevioussection),sothisparameterdoesn’taffectmatchingbutonlyboosting.

BoostqueriesThebqparameterdefinesaqueryparsedbytheSolrqueryparserthatwilladditionallyboostsearchresults.Itcanberepeated,thusallowingoneormorequeries.

If,forexample,youwanttogivemoreimportancetoitemswithapricethatfallswithinagivenrange,youcanuseaboostquerylikethis:

price:[10.00TO19]

AdditiveboostfunctionsThebfparameterdefinesafunctionthatwilladditionallyboostsearchresultsbyaddingitsvaluetothecomputedscore.Aswiththebqparameter,itcanberepeatedinordertohavemultiplefunctions.

TiebreakerThetieparameterisafloatnumber.Ithasavaluebetween0and1,anditaffectsthestrategyusedbytheparsertodeterminethefinalscoreofagiven(matching)document.

www.it-ebooks.info

Page 146: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheDisjunctionMaxparser,assaidbefore,executesasetofsubqueriesontopofthefieldsdeclaredintheqfparameter.Thesubquerythathasthemaximumscoredeterminesthescoreofthedocument.Soschematically:

documentScore=scoreofmatchingsubquerywithhighestscore

However,youcouldendupwithtwodocumentsgettingthesamescore,becausethemaximumvaluecomputedbyeachwinnersubqueryisthesame.

Thetieparameterletsyoutakefine-grainedcontrolofthefinalscoreassignedtoeachdocument,byincludingthescoreofallmatchingsubqueriesinthecomputation.Thoseadditionalscoresaremultipliedbyafactor,thetievalue.So,theprecedingformulabecomesthefollowing:

documentScore=(scoreofmatchingsubquerywithhighestscore)+((tie)

*(scoresofothermatchingsubqueries))

Withavalueof0.0,wewillhaveapuredisjunctionmaxquery,whereonlythemaximumscoreisincluded.Avalueof1.0willleadtoadisjunctionsumquery,wherethefinalscoreisthesumofthescoresofallmatchingsubqueries.

www.it-ebooks.info

Page 147: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TheExtendedDisjunctionMaximumqueryparserThisparser(eDisMax)isbuiltontopoftheDisMaxparserandhassomeadditionalfeaturessuchasfieldedsearch,Booleanoperators,termmodifiers,andbetterhandlingofmistakesinqueries.

NoteTheeDisMaxparsercanbeenabledbysettingthedefTypeparametertoedismax.

TheexampleSolrinstancehasarequesthandlerlisteningto/glike2thatusestheeDisMaxparser.

Thefollowingsectionsdescribeadditionalparametersthatthisparseraccepts.AllparametersdescribedintheDisMaxparsersectionareincluded.

FieldedsearchTheeDisMaxparsersupportsthefullsyntaxoftheSolrqueryparser,thereforeallowingaso-calledfieldedsearch(thatis,title:Jazz)withBooleanoperatorsandtermmodifiers(forexample,fuzzyandproximity).

Inaddition,thisparsersupportsfieldaliasingandrenaming.Thisallowsyoutogiveaninteractionviewtotherequestor(forexample,anenduser,aqueryclient,andsoon)thatispartiallyorcompletelydecoupledfromSolr’sunderlyingdatamodel.

Aliasingisdoneusingthefollowingsyntax:

f.<alias>.qf=(oneormorerealfieldswithoptionalboosts)

Here,<alias>isthevirtualnamethatwillbeassociatedwiththefield(orfields)declaredontherightoperand.Asyoucansee,analiascanbeappliedtosinglefieldsortoagroupoffields.Whenaliasesaredeclared,requestorscanusethemintheirqueries.

Wecanusealiasestolocalizefieldnames:

f.artista.qf=artist//Italianuserswillseean"artista"field

f.kunstler.qf=artist//forGermanusers

Wecanalsousethemtocreatemetafieldsthatgroupasetofrealfields:

f.people.qf=author,illustrator,editor,translator

f.titles.qf=title,front_cover_title,sub_title,uniform_title

PhrasebigramandtrigramfieldsOtherthansupportingthepfparameterwehavealreadyseenforDisMax,thisparseraddstwooptionalfeatures.Thepsparameterbooststhescoreofdocumentswhereinputtermsappearinproximity.Thepf2andpf3parametersofferthesamefeaturebutbysplittingtheinputtermsinconsecutivebigramsandtrigrams,respectively.Therefore,theAllthethingsyouareinputstringwillbecomethefollowingsetof(consecutive)bigrams:

Allthe,thethings,thingsyou,youare

www.it-ebooks.info

Page 148: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Forthesamelogic,itwillbecomethefollowingsetoftrigrams:

Allthethings,thethingsyou,thingsyouare

PhrasebigramandtrigramslopAspssetsthephraseslopforthepfparameter,ps2andps3dothesameforpf2andpf3.Iftheyareabsent,thevalueofpsisused.

MultiplicativeboostfunctionTheboostparameterdeclaresonefunctionasthebfparameter,aswehaveseenfortheDisMaxparser.Thedifferencehereisthatthefunctionvalueismultiplied(notadded)bythecomputedscore.

UserfieldsTheufparameterspecifieswhichfields(realorvirtual)therequestorsareallowedtouseintheirqueries.Usedinconjunctionwithaliasing,itallowsyoutocompletelyhiderealfieldsandhavequerieswithonlyvirtual(thatis,aliased)fields.

LowercaseoperatorsInplainSolrqueryparsersyntax,operatorsneedtobeinuppercase(AND,OR).ThelowercaseOperatorsflagparameter,whichdefaultstotrue,allowsustointerpretasoperatorslowercasetokens(and,or).

NoteAtthetimeofwritingthisbook,onlytheandandorBooleanoperatorsareaffectedbythisparameter.TheNOToperatorisnothandled,andtherefore,thelowercasewordnotisparsedasaliteralterm,eveniflowercaseOperatorsissettotrue.TheJiraissueathttps://issues.apache.org/jira/browse/SOLR-3580trackstheactivityonthistopic.

www.it-ebooks.info

Page 149: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OtheravailableparsersTherearealotofotheravailableparsers,aslistedinthefollowingtable:

Parser Code Description

Lucenequeryparser

luceneTheLucenequeryparserhasmoreorlessthesamefeaturesastheSolrqueryparser.However,thisistheLucene-specificimplementation.

Functionqueryparser

func Createsafunctionqueryfromtheinputstring.

Joinqueryparser

join Normalizesrelationshipsbetweendocumentsbyemulatingajoin.

Termqueryparser

term Createsasingle-termqueryfromtheinputstring.

Boostqueryparser

boostCreatesaboostedqueryfromtheinputstring.Anadditionalparameter,b,isrequiredtoindicatetheboostfunction.

Rawqueryparser

raw Createsatermqueryfromtheinputstringwithoutanytextanalysis.

Spatialfilterqueryparser

geofilt Enablesspatialqueries.

Fieldqueryparser

field Createafieldqueryfromtheinputstring.

Surroundqueryparser

surround Createsasurroundquery.Thisqueryisusedforproximitysearches.

Besidesallofthis,thequeryparserframeworkhasbeenconceivedwithextensibilityinmind,sodevelopersarefreetoimplement,register,andusetheirownqueryparsers.

www.it-ebooks.info

Page 150: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 151: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SearchcomponentsAsearchcomponentisareusablemodulethatcontributestosearchresults.Whiledefiningasearchhandler,thatis,acontrollerforagivenkindofsearch,youcancustomizeitsbehaviorbydefiningandconfiguringsearchcomponentsthatwillcontributetoitsoutputresults.

Searchcomponentsmustbedeclaredandusedwithinsolrconfig.xml,themainSolrconfigurationfile.Acomponentdeclarationrequiresaname,theimplementationclass,andasetofoptionalinitializationparameters:

<searchComponentname="prices"class="a.b.c.MyComponent">

<strname="ds-jndi">jdbc/datasource</str>

<strname="service-uri">http://example.org#me</str>

</searchComponent>

Oncedeclared,thesecanbeusedwithinrequesthandlers,whicharetheruntimecontrollersoftheexecutionsofrequests(wewillcoverrequesthandlerslaterinthechapter).

Therearesomepredefinedsearchcomponentsthatmustn’tbeexplicitlydeclaredinsolrconfig.xml.

NoteThatdoesn’tmeantheyareautomaticallyenabled.Theymustbeexplicitlyactivatedordisabled,dependingontheirdefaultstate.

Thedefaultcomponentsarethosecomponentsthatareresponsibleforabsolvingthefundamentalorcommonstepsofaqueryexecutionflow.Thisisthereasonthere’snoneedtodeclarethemexplicitly,unlessyouwanttouseadifferentconfiguration.Inthefollowingsections,wewillillustratethesecomponents.

www.it-ebooks.info

Page 152: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

QueryThequerycomponentisresponsibleforparsingandexecutingaquery.Thisisthecomponentthatacceptsqueryandqueryparserparameters,getsareferencetotheappropriatequeryparser,coordinatestheparserinordertoproduceaquery,executesthatquery,andoutputsacorrespondingresponse.

www.it-ebooks.info

Page 153: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

FacetThiscomponentenablestheso-calledfacetedsearch.Itcontributestosearchresultsbyaddingasetofconfigurableaggregationscalledfacets.

Whenyouexecutesomesearch,youwillgetbackasinglepageofresultsconsistingofacertainnumberofmatchingdocuments.Enablingfacetingallowsyoutogetanadditionalperspectiveoftheoveralldata,consistingofasetofaggregations.ThefollowingscreenshotshowssomeSolr-poweredfacetsinactiononawebsite,ontherightside:

Thefacetcomponentcanbeactivatedbyspecifyingafacetparameterwithoneofthefollowingvalues:yes,true,oron.

Solrprovidesseveraltypesoffacets:queries,fields,ranges,pivot,andinterval.Eachofthem,wheneverenabled,willaddadedicatedsectiontotheresponse.

FacetqueriesThefacet.queryparameterdeclaresaquery(parsedbytheSolrqueryparser)thatwillbeusedasafacetwiththecorrespondingcounts.Theresults(thatis,counts)ofthisfacetwillbeinaspecificresponsesectioncalledfacets_queries.Theparametercanberepeatedmultipletimes,allowingustospecifyseveralqueries.Usingtheexampledataset,withSolrrunning,openabrowserandtypehttp://127.0.0.1:8983/solr/example/select?q=*:*&facet=true&facet.query=genre:jazz

IntheXMLresponse,youwillseematchingdocumentswithinthe<result>tag,andanadditionalsectiondedicatedtofacets:

<lstname="facet_counts">

<lstname="facet_queries">

<intname="genre:Jazz">3</int>

</lst>

<lstname="facet_fields"/>

<lstname="facet_dates"/>

<lstname="facet_ranges"/>

</lst>

Here,youcanseethatthreedocumentsmatchthefacetquery.Theotherfacetsectionsare

www.it-ebooks.info

Page 154: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

emptybecausewedidn’taskforthem.

FacetfieldsFacetfieldsaresurelythemostpopularkindoffacets.Theyaggregatesearchresultsusingasetofgivenandconfigurablefields.

NoteRememberthatafieldmustbedeclaredasindexedintheschemainordertobefaceted.

Otherthanactivatingthefacetfeatureforagivenfield,Solrhasarichsetofparametersthatcanbeusedtotuneandconfigurethefield’sfacetingbehavior.Thesesettingscanbespecifiedforallfieldsorforagivenfield.Forthefirstcase,thefollowingtableillustratestheavailableparameters,theirnames,andmeanings.Forfield-specificsettings,thesameparametersmustbedeclaredwiththefollowingconvention:

f.<field>.<parameter>=<value>

Inthisway,thevalueassociatedwithparameterwillbevalidonlyforthespecificfield.

Parameter Description

facet.field Declaresafieldthatwillbeusedasafacet.Thisparametermustberepeatedforeachfacetfield.

facet.prefix Limitsthetermsusedinfacetingtovaluesthatbeginwithagivenprefix.

facet.sortThesortstrategyofcountswithineachfacet.Onlytwovaluesareallowed:count,whichmeansorderbycount,andindex,whichmeanslexicographicorder.

facet.limitThemaximumnumberofcountsthatcanbereturnedforeachfacet.Avalueof-1willreturnallavailablecounts.

facet.offset Specifiesastartoffsetwithintheavailablecountsoffacets.

facet.mincount Theminimumcountneededforafieldtobeincludedintheresponse.

facet.missingIncludesintheresponsethecountofdocumentsthatmatchthequerybutdon’thaveavalueforagivenfacet.

facet.method ThetypeofalgorithmthatSolrwillusetocomputefacets.

facet.threads Thenumberofparallelworkers(thatis,threads)thatwillcomputethefacets.

Returningtoourpreviousexample,let’sremovethefacetqueryandusesomeadditionalparameterssothatfacetfieldswillbebuilt(forsimplicity,onlythequerystringisreported):

q=*:*&facet=on&facet.field=genre&facet.minCount=1

Inthefacetsections,youwillseethegenrefacetsunderthefacet_fieldssubsection:

<lstname="facet_fields">

<lstname="genre">

www.it-ebooks.info

Page 155: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

<intname="ProgressiveRock">10</int>

<intname="Rock">5</int><intname="Fusion">4</int>

<intname="HeavyMetal">4</int>

<intname="Popmetal">1</int></lst>

</lst>

Weaskedforthegenrefacetandwesetmincountto1,whichmeansthatfacetswithnocountsareexcludedfromtheresponse.Itisimportanttounderlinethefactthatthedisplayedvalueforafacetfieldisitsindexedvalue,andnotthestoredvalue(thatis,thevaluethatiscopiedverbatimasitarrivesininputdocuments).Inthepreviousexample,thegenrefieldisString,andtherefore,itisnottokenized.Thisisthereasonyouseethecompoundterm(ProgressiveRock)asoneofitsvalues.IfthatfieldhadbeendeclaredasTextFieldandtokenizedwithWhiteSpaceTokenizer,youwouldhaveseentwodifferentvaluesforthatfacet(assumingnofurtherfiltering):ProgressiveandRock.

FacetrangesFacetrangescanbeappliedtonumericordatefields.Asthenamesuggests,withfacetranges,Solrcreatesafacetclassificationbasedonranges.Thefollowingparameterscontrolthiskindoffaceting:

Parameter Description

facet.rangeDeclaresafieldthatwillbeusedasthefacetrange.Theparametermustberepeatedforeachfacetfield.

facet.range.start Declaresthestartofthefacetinterval.

facet.range.end Declarestheendofthefacetinterval.

facet.range.gap Thesizeofeachstepbetweenthestartandtheendoftheinterval.

Thefollowingisasamplequerythatusesfacetrangesforfacetingalbumsbyreleasedate:

q=*:*&facet=on&facet.range=released&facet.range.start=1950&facet.range.end=

2000&facet.range.gap=10

Thatwilladdanothersectionwithinthefacet_countselement:

<lstname="facet_ranges">

<lstname="released">

<lstname="counts">

<intname="1950">1</int>

<intname="1960">1</int>

<intname="1970">6</int>

<intname="1980">8</int>

<intname="1990">5</int>

</lst>

</lst>

</lst>

Pivotfacets

www.it-ebooks.info

Page 156: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Wepreviouslydescribedfacetfields;theyprovidetheabilitytoaggregatesearchresultsbyoneormorecategories.Pivotfacetsgoastepaheadinthatdirection.Theyallowustoanalyzedatainmultipledimensions,breakingdownthefacetedvaluesbysubsequent,nestedsubcategories.

Thiskindoffacetingcanbeactivatedthrougharequestlikethis:

q=*:*&facet=on&facet=true&facet.pivot=genre,released

Thefacet.pivotparametercanberepeatedmultipletimes.Foreachrepetition,therewillbeadedicatedandaggregatedresultwithinthefacet_pivotsectionoftheresponse.Here,forsimplicity,weputjustoneparameterwithtwocategories,genreandreleased.Thefollowingexampleisanextractoftheresponseyouwillgetusingthesampleinstanceassociatedwiththischapter:

<lstname="facet_pivot">

<arrname="genre,released">

<lst>

<strname="field">genre</str>

<strname="value">ProgressiveRock</str>

<intname="count">10</int>

<arrname="pivot">

<lst>

<strname="field">released</str>

<intname="value">1992</int>

<intname="count">2</int>

</lst>

<lst>

<strname="field">released</str>

<intname="value">1969</int>

<intname="count">1</int>

</lst>

<lst>

<strname="field">genre</str>

<strname="value">Rock</str>

<intname="count">5</int>

<arrname="pivot">

<lst>

<strname="field">released</str>

<intname="value">1969</int>

<intname="count">1</int>

</lst>

<lst>

<strname="field">released</str>

<intname="value">1986</int>

<intname="count">1</int>

</lst>

Asyoucansee,thegenrefacetisbrokendownbyanestedreleasedcategory.Notethattheprecedingnestedstructureisreturnedwithjustonerequest-responseinteraction.Inordertogetthesameresultwithclassicfacetfields,youshouldquerySolrseveraltimeswithincrementalfilters.That’sthereasonthepivotfacetsfeature,actingasafaçadeandhidingallofthatinteractioncomplexity,isveryusefulfornavigatingthehierarchyof

www.it-ebooks.info

Page 157: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

thoseaggregations.However,itshouldbeusedcarefully,asitcouldhaveanimpactonperformance.

IntervalfacetsIntervalfacetswereintroducedinSolr4.10.Theycanbeseenasanalternativetofacet(range)queriesbecausetheyallowyoutosetintervalcriteriaforoneormorefields,andcountthenumberofmatchingdocumentsthathavevalueswithinthoseconstraints.

Althoughthesameresultcanbeachievedwithfacetrangequeries,thisimplementationcouldprovideperformanceimprovementinseveralcontexts.AssuggestedintheSolrreferenceguide,itisrecommendedthatyoutryboththemethods.

www.it-ebooks.info

Page 158: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

HighlightingThehighlightcomponentcontributestosearchresultsbyaddingasectionthatcontains(foreachdocumentinthecurrentresultpage)asetofsnippetshighlightingthesearchtermsthatareinthedocumentcontent(thatis,inoneormorefieldsofthedocument).Thefollowingscreenshotshowsawebapplicationthatusesthehighlightingfeature:

ThisfeatureisparticularlyusefulwhenyourdatacomesfromrichdocumentssuchasPDFsorMicrosoftOfficedocuments(asshownintheprecedingexample).Usingthehighlightingfeature,it’spossibletogivetheenduseranapproximateideaofthecontextwhere,withinthedocument,enteredtermshavebeenfound.

TipWithintheexampleSolrinstanceassociatedwiththischapter,thereisarequesthandlercalled/highlightthatenablesthisfeatureontitleandartistfields.

Thehighlightingcomponentcanbetuned,orconfigured,withseveralparameters.

www.it-ebooks.info

Page 159: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Fortunately,theprovideddefaultvaluesworkwellinmanyscenarios.Someofthoseparametersaredescribedinthefollowingtable:

Parameter Description

hl Turnshighlightingofforon.Thedefaultvalueisfalse.

hl.qTermstobehighlightedaretakenfromthemainqueryunlessthisparameter,whichitselfrequiresaquery,isspecified.

hl.flAspace-orcomma-separatedlistoffieldsthatwillbeusedforhighlighting.Snippetswillcomeonlyfromthesefields.

hl.snippets Thenumberofhighlightingsnippetsthatwillbereturned.Thedefaultvalueis1.

hl.maxAnalyzedCharThemaximumnumberofcharactersthatwillbeinspected(inagivenfield)tocomputethesnippets.

hl.simple.pre/hl.simple.postIndicatestextthatshouldappearbeforeandafterahighlightedterm.Theydefaultto<em>and</em>HTMLtags,respectively.

Solrcomeswiththreedifferentkindofhighlighters,describedinthefollowingsections.

StandardhighlighterThisisthefirsthighlighterthatwasintroducedinSolr.Solrusesitbydefault.Itisabletoworkontopofalotofquerytypesanddoesn’thaveanyspecialrequirementonfieldstobehighlighted.However,inordertospeedupitswork,termVectorsshouldbeturnedon(forthosefields).

FastvectorhighlighterFastvectorhighlighteristhesecondtypeofhighlighterintroducedinSolr.ItrequiresthattermVectors,termPositions,andtermOffsetsareturnedonforeachfieldthatneedstobehighlighted.Thatallowsfastandscalableexecution,especiallywithdocumentscontaininglargeamountsoftext,butrequiresalotofextraspacefortheindex.However,itsupportsfewquerytypes.

Thefastvectorhighlightercanbeenabledbysettingthehl.useFastVectorHighlighterparametertotrue.

Notethat,iftheprecedingflagsarenotsetfortargetfields,SolrwillcontinuetouseStandardHighlighter.

PostingshighlighterThishighlighterdoesn’tusetermvectors,nordoesitreanalyzethetexttobehighlighted.ItonlyrequiresthestoreOffsetsWithPositionsflagsetforthefieldstobehighlighted.Unliketheothers,thishighlightermustbeexplicitlydeclaredinthesolrconfig.xmlfilewiththefollowingdeclaration:

<searchComponentclass="solr.HighlightComponent"name="highlight">

<highlightingclass="org.apache.solr.highlight.PostingsSolrHighlighter"/>

www.it-ebooks.info

Page 160: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

</searchComponent>

Thisisagoodcompromise,comparedwiththefirsttwohighlighters,intermsofperformanceandindexspace.Theinformation(thatis,thepostingoffsets)requiredbythestoreOffsetsWithPositionsflagischeaperthantermvectorsintermsofmemoryanddiskoccupation.However,itissupposedtobeusedtohighlightsimplequeryterms,soitcouldhavesomeunexpectedorunwantedresultswithphrasequeries.

www.it-ebooks.info

Page 161: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MorelikethisThemorelikethissearchcomponentallowsustofinddocumentsthathavesomekindofsimilaritywithagivendocument.ThereareseveralwaystousethisfeatureinSolr:

MoreLikeThisHandler:Thisisafrontcontrollerthatiscompletelydedicatedto“morelikethis”requests.Itacceptsaquerythatidentifiesadocument,andlooksforsimilardocumentsaccordingtoaconfiguredcriterion.MoreLikeThisHandler:ThisissimilartoMoreLikeThisHandler,butinsteadoftakingadocumentastheinput(matchedbyagivenquery),thetextusedtocomputesimilaritycanbedirectlypassedorfetchedfromaURL.MoreLikeThisSearchComponent:Asasearchcomponent,itwillexecutethesimilarsearchforeachdocumentofthecurrentresultpage,thusappendingamorelikethissectiontotheSolrresponse,withalistofsimilardocumentsforeachdocument.Thisisnotreallyrecommendedbecauseitcouldslowdownoverallqueryexecution.

Ingeneral,thefirsttypeisthemostwidelyused.MoreLikeThisdoesn’thavespecialrequirementsforfieldsthataretobeusedforthesimilaritycomputation.However,forbestperformance,TermVectorsshouldbeenabledforthem.

Thefollowingtableillustratestheparametersacceptedbythiscomponent:

Parameter Description

mlt Turnshighlightingofforon.Itdefaultstofalse.

mlt.count Themaximumnumberofsimilardocumentsthatmustbereturned(foreachdocument).

mlt.flThefieldsusedforsimilarity.TheyshouldhaveTermVectorsenabled(recommended)ortheyneedtobestored.

mlt.qfAlistofspace-orcomma-separatedfields(alreadydeclaredinmlt.fl)withcorrespondingboosts.

mlt.minwl/

mlt.maxwl

Theminimumandmaximumwordlengthboundaries.Wordswhoselengthismorethattheseboundariesareignored.

mlt.boostAflagindicatingwhetherthequerywillbeboostedbytherelevanceoftheinterestingterms.Itdefaultstofalse.

mlt.mintf Thisistheminimumtermfrequencyboundary.Itdefaultsto2.

mlt.mindf Thisistheminimumdocumentfrequencyboundary.Itdefaultsto5.

www.it-ebooks.info

Page 162: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OthercomponentsOtherthanthecomponentswesawintheprevioussections,thereareotherbuilt-insearchcomponentsthatarepartoftheSolrframework.Rememberthat,ifyouwanttousethem,theywillhavetobeexplicitlydeclaredandconfiguredwithintheSolrconfiguration.

Thefollowingisashortandnon-exhaustivelistofadditionalcomponents:

Queryelevation:ThisisusedtogivemoreimportancetosomeresultsusingacriterionthathasnothingtodowiththenormalSolrscoringalgorithm.Thecomponentletsyouassociateagivenquerywithacorrespondinglistofmostimportantresults.Terms:ThisprovidesaccesstotheLuceneinternaltermdictionary.Stats:Thisprovidesnumericfieldsstatistics.Spellcheck:Thisprovidesspellcheckingcapabilitiesbymeansofn-gramanalysisofindexeddocumentsorexternaldictionaries.Fromafunctionalpointofview,thiscomponentisusedtobuildtheso-called“Didyoumean?”feature,offeringalternativesearchsuggestionsincaseofusermistakes.TermVector:Thisaddstermvectors(thatis,term,frequency,position,offset,andIDF)ofthematchingdocumentstoarequest.Debug:Thisaddsdebugingandexplanatoryinformationabouttherequestexecution.

www.it-ebooks.info

Page 163: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 164: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SearchhandlerWesawrequesthandlersinthepreviouschapter.There,wedefinedarequesthandlerasapluggablecomponentthathandlesincomingrequests.Inthatchapter,wewerereferringtoupdaterequests,thatis,requestscontainingindexupdatecommands.

Here,wewillfocusourattentiononSearchHandler,aspecialfrontcontrollerusedtohandleincomingsearchrequests.TheSearchHandlerclass,althoughitcouldbeseenasthesupertypelayerofallsearchhandlers,isnotabstractanditdefinesastandardsearchbehavior.

www.it-ebooks.info

Page 165: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

StandardrequesthandlerStandardRequestHandlerisanemptysubclassofSearchHanlder,soatthetimeofwritingthisbook,usingoneofthemisbasicallythesame.Requesthandlersaredeclaredinthesolrconfig.xmlfile,andtheydefinesearchendpoints.Eachinstanceisassociatedwithagivennameprefixedbyaslash(thenamemustbeunique),animplementationclass,andasetofconfigurationparameters:

<requestHandlername="/mySeacher"class="solr.SearchHandler">

(configuration)

</requestHandler>

WiththesampleSolrinstancerunning,theprecedinghandlerwillanswertooneoftheseURIs:http://localhost:8983/solr/example/query

http://localhost:8983/solr/example/facets

http://localhost:8983/solr/example/jazz

ConfiguringaSearchHandlerinstancemeansdefiningconfigurationparametersand(optionally)searchcomponentsthatwillparticipateinthequeryexecutionchain.

SearchcomponentsMostofthetime,unlessyouhaveaspecificneed,thesearchcomponentsthatdrivethelogicofthesearchexecutioncanbeomittedbecausethefollowinglistwillbeautomaticallyinjected:

Code Component

query QueryComponent

facet FacetComponent

mlt MoreLikeThisComponent

highlight HighlightComponent

stats StatsComponent

debug DebugComponent

Onlythe“query”componentisenabled;theothersneedtobeexplicitlyactivated.

Ifthedefaultchainisnotwhatyouneed,itispossibletodefineacustomchaininthefollowingway:

<arrname="components">

<str>query</str>

<str>facet</str>

…othercomponentsfollow

</arr>

www.it-ebooks.info

Page 166: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Thiswillcompletelyreplacethedefaultchain.Itisalsopossibletoleavethedefaultchainasitisandhaveadditionalprependedorappendedcomponents:

<arrname="first-components">

<str>my_custom_component</str>

…othercomponentsfollow

</arr>

<arrname="last-components">

<str>another_custom_component</str>

…othercomponentsfollow

</arr>

So,ingeneral,theorderofexecutionforsearchcomponentswillbethefollowing:

Componentsdeclaredas“first-components”(optional).Componentsdeclaredas“components”Intheirabsence,thedefaultchainwillbeused.Componentsdeclaredas“last-components”(optional).

ThefollowingisanexampledeclarationofStandardRequestHandler:

<requestHandlername="/jazz"class="solr.StandardRequestHandler">

<!--parametersthatwillbealwaysappliedtotheincomingrequests-->

<lstname="invariants">

<intname="rows">10</int>

</lst>

<!--parametersthatwillbealwaysaddedtotheincomingrequests-->

<lstname="appends">

<intname="fq">genre:jazz</int>

</lst>

<!--defaultsettingsthatcanbeoverriddenbytheincomingrequests-->

<lstname="defaults">

<strname="sort">titleasc</str>

<strname="echoParams">explicit</str>

<strname="q">*:*</str>

<boolname="facet">false</bool>

</lst>

<!—Thisisacustomsearchcomponentthatwillrunafterthedefault

componentchain-->

<arrname="last-components">

<str>prices</str>

</arr>

</requestHandler>

QueryparametersTherequesthandlersandthesearchcomponentsinvolvedinthechainacceptseveralparameterstodrivetheirexecutionlogic.Theseparameters(withcorrespondingvalues)canbedeclaredinthreedifferentsections:

defaults:Parametervalueswillbeusedunlessoverriddenbyincomingrequests

www.it-ebooks.info

Page 167: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

appends:Parametervalueswillappendedtoeachrequestinvariants:Parametervalueswillbealwaysbeappliedandcannotbeoverriddenbyincomingrequestsorbythevaluesdeclaredindefaultsandappendsections

Allsectionsareoptional,soyoucanhavenoparametersconfiguredforagivenhandlerandallowtheincomingrequeststodefinethem.Thisisanexampleofahandlerconfiguration:

<lstname="defaults">

<strname="defType">edismax</str>

</lst>

<lstname="appends">

<strname="facet.field">artist</str>

<strname="facet">genre</str>

</lst>

<lstname="invariants">

<strname="wt">json</str>

<boolname="facet">true</bool>

</lst>

www.it-ebooks.info

Page 168: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

RealTimeGetHandlerRealTimeGetHandlerisbasicallyaSearchHandlersubclassthataddsRealTimeSearchComponenttothesearchrequestexecution.Inthisway,it’spossibletoretrievethelatestversionofsoftlycommitteddocumentsbyspecifyingtheiridentifiers.

Inordertoenablesuchacomponent,youmustturntheupdatelogfeatureon,insolrconfig.xml:

<updateHandlerclass="solr.DirectUpdateHandler2">

<updateLog>

<strname="dir">${solr.ulog.dir:}</str>

</updateLog>

</updateHandler>

Thentherequesthandlercanbedeclaredandconfiguredusingtheprocedurethatwesawintheprevioussection:

<requestHandlername="/get"class="solr.RealTimeGetHandler">

</requestHandler>

Thishandleracceptsanadditionalidoridsparameterthatallowsustospecifytheidentifiersofthedocumentswewanttoretrieve.Theidparameteracceptsoneidentifierandcanberepeatedinrequests.Theidsparameteracceptsacomma-separatedlistofidentifiers.

TipOncetheexampleSolrinstanceisup,thishandlerrespondsto/getrequests.

www.it-ebooks.info

Page 169: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 170: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ResponseoutputwritersAsalaststep,queryresultsarereturnedtorequestorsinagivenformat.SolrcommunicateswithclientsusingtheHTTPprotocol.Thoseclientsarefreetostarttheinteractionbyaskingforoneformatoranother,dependingontheirneeds.

Althoughadefaultformatcanbeset,theclientcanoverrideitbymeansofthewtparameter.Thevalueofthewtparameterisamnemoniccodeassociatedwithanavailableresponsewriter.

Thereareseveralbuilt-inresponsewritersinSolr,whicharedescribedhere:

ResponseWriter Description

xml TheeXtensibleMarkupLanguageresponsewriter.Thisisthedefaultwriter.

xslt CombinestheXMLresultswithanXSLTfileinordertoproducecustomXMLdocuments.

json JavaScriptObjectNotationresponsewriter.

csv Comma-SeparatedValueresponsewriter.

velocityThisusesApacheVelocitytodirectlybuildwebpageswithqueryresults.Itisveryusefulforfastprototyping.

javabinJavaclientshaveaprivilegedwaytoobtainresultsfromSolrusingthisresponsewriter,whichdirectlyoutputsJavaObjects.

python,ruby,php

Specializedresponsewritersfortheselanguagesthatproduceastructuredirectlytiedtothelanguagerequirements.

www.it-ebooks.info

Page 171: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 172: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ExtendingSolrThefollowingsectionswilldescribeandillustrateacoupleofwaysofextending,andcustomizingsearchesinSolr.

www.it-ebooks.info

Page 173: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Mixingreal-timeandindexeddataSometimes,asapartofyoursearchresults,youmaywanttohavedatathatisnotmanagedbySolrbutretrievedfromareal-timesource,suchasadatabase.

Thinkofane-commerceapplication;whenyousearchforsomething,youwillseetwopiecesofinformationbesideeachitem:

Price:Thiscouldbetheresultofsomekindoffrequentlyupdatedmarketingpolicy.Non-real-timeinformationcouldcauseproblemonthevendorside(forexample,awrongpricepolicycouldbeapplied).Availability:Here,wronginformationcouldcauseaninvalidclaimfromcustomers;forexample,“IboughtthatbookbecauseIsawitasavailable,butitisn’t!”

Thisisagoodscenariofordevelopingasearchcomponent.WewillcreateoursearchcomponentandassociateitwithagivenRequestHandler.

Asearchcomponentisbasicallyaclassthatextends(notsurprisingly)org.apache.solr.handler.component.SearchComponent:

publicclassRealTimePriceComponentextendsSearchComponent

Theinitializationofthecomponentisdoneinamethodcalledinit.Here,mostprobablywewillgettheJNDInameofthetargetdatasourcefromtheconfiguration.Thissourceiswherethepricesmustberetrievedfrom:

publicvoidinit(NamedListargs){

StringdsName=SolrParams.toSolrParams(args).get("ds-name");

Contextctx=newInitialContext();

this.datasource=(DataSource)ctx.lookup(dName);

}

Nowwearereadytoprocesstheincomingrequests.Thisisdoneintheprocessmethod,whichreceivesaResponseBuilderinstance,theobjectwewillusetoaddthecomponentcontributiontothesearchoutput.Sincethiscomponentwillrunafterthequerycomponent,itwillfindalistcontainingqueryresultsinResponseBuilder.Foreachitemwithinthoseresults,ourcomponentwillquerythedatabaseinordertofindacorrespondingprice:

publicvoidprocess(ResponseBuilderbuilder)throwsIOException{

SolrIndexSearchersearcher=builder.req.getSearcher();

//holdsthecomponentcontribution

NamedListcontrib=newSimpleOrderedMap();

for(DocIteratorit=builder.getResults().docList.iterator();

iterator.hasNext();){

//ThisistheLuceneinternaldocumentid

intdocId=iterator.nextDoc();

Documentldoc=searcher.doc(docId,fieldset);

//ThisistheSolrdocumentId

Stringid=ldoc.get("id");

www.it-ebooks.info

Page 174: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

//Getthepriceoftheitem

BigDecimalprice=getPrice(id);

//Addthepriceoftheitemtothecomponentcontribution

result.add(id,price);

}

//Addthecomponentcontributiontotheresponsebuilder

builder.rsp.add("prices",result);

}

Insolrconfig.xml,wemustdeclarethecomponentintwoplaces.First,wemustdeclareandconfigureitinthefollowingmanner:

<searchComponentname="prices"class="a.b.c.RealTimePriceComponent">

<strname="ds-name">jdbc/prices</str>

</searchComponent>

Thenithastobeenabledinrequesthandlers(asshowninthefollowingsnippet).Sincethiscomponentissupposedtocontributetoasetofqueryresults,itmustbeplacedafterthequerycomponent:

<requestHandlername="/xyz"…>

<arrname="last-components">

<str>prices</str>

</arr>

</requestHandler>

Done!Ifyourunaqueryinvokingthe/xyzrequesthandleryouwillseeafterqueryresultanewsectioncalledprices(thenameweusedforthesearchcomponent).Thisreportsthedocumentidandthecorrespondingpriceforeachdocumentinthesearchresults.

TipYoucanfindthesourcecodeoftheentireexampleinthesrcfolderoftheprojectassociatedwiththischapter,undertheorg.gazzax.labs.solr.ase.ch3.sppackage.

IfyouwanttostartSolrwiththatcomponent,justrunthefollowingcommandfromthecommandlineorfromEclipse:

mvncleaninstallcargo:run–Pcustom-search-component

www.it-ebooks.info

Page 175: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UsingacustomresponsewriterInaprojectIwasworkingon,weimplementedtheautocompletefeature,thatis,alistofsuggestionsthatquicklyappearsunderthesearchfieldeachtimeausertypesakey.Thus,thesearchstringisgraduallycomposed.Thefollowingscreenshotshowsthisfeature:

Anewresponsewriterwasimplementedbecausetheuserinterfacewidgethadalreadybeenbuiltbyanothercompany,andtheexchangeformatbetweenthatwidgetandthesearchservicehadbeenalreadydefined.

DoingthatinSolrisveryeasy.Aresponsewriterisaclassthatextendsorg.apache.solr.response.QueryResponseWriter.LikeallSolrcomponents,itcanbeoptionallyinitializedusinganinitcallbackmethod,anditprovidesawritemethodwheretheresponseshouldbeserializedaccordingtoagivenformat:

publicvoidwrite(

Writerwriter,

SolrQueryRequestrequest,

SolrQueryResponseresponse)throwsIOException{

//1.Getareferencetovaluesthatcompoundthecurrentresponse

NamedListelements=response.getValues();

//2.UseaStringBuildertobuildtheoutput

StringBuilderbuilder=newStringBuilder("{")

.append("query:'")

.append(request.getParams().get(CommonParams.Q))

.append("',");

//3.Getareferencetotheobjectwhich

//holdthequeryresult

Objectvalue=elements.getVal(1);

if(valueinstanceofResultContext)

{

ResultContextcontext=(ResultContext)value;

//Theorderedlist(actuallythepagesubset)

//ofmatcheddocuments

www.it-ebooks.info

Page 176: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DocListids=context.docs;

if(ids!=null)

{

SolrIndexSearchersearcher=request.getSearcher();

DocIteratoriterator=ids.iterator();

builder.append("suggestions:[");

//4.Iterateoverdocuments

for(inti=0;i<ids.size();i++)

{

//5.Foreachdocumentweneedtogetthe"label"attr

Documentdocument=searcher.doc(iterator.nextDoc(),FIELDS);

if(i>0){builder.append(",");}

//6.Appendthelabelvaluetowriteroutput

builder

.append("'")

.append(((String)document.get("label")))

.append("'");

}

builder.append("]").append("}");

}

}

//7.andfinallywriteouttheresult.

writer.write(builder.toString());

}

That’sall!Nowtryissuingaquerylikethis:http://127.0.0.1:8983/solr/example/auto?q=ma

Solrwillreturnthefollowingresponse:

{

query:'ma',

suggestions:['MarcusMiller','MichaelManring','Gotamatch','Nigerian

Marketplace','TheCryingmachine']

}

TipYoucanfindthesourcecodeoftheentireexampleundertheorg.gazzax.labs.solr.ase.ch3.rwpackageofthesourcefolderintheprojectassociatedwiththischapter.

IfyouwanttostartSolrwiththatwriter,runthefollowingcommandfromthecommandlineorfromEclipse:

mvncleaninstallcargo:run–Pcustom-response-writer

www.it-ebooks.info

Page 177: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 178: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TroubleshootingThissectionwillprovidehelp,tips,andsuggestionsaboutdifficultiesthatyoucouldmeetwhileyou’reexperimentingwithwhatwedescribedinthischapter.

www.it-ebooks.info

Page 179: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Queriesdon’tmatchexpecteddocumentsThere’snosingleanswertothisbigandpopularquestion.Withoutanyadditionalinformation,thefirsttwothingsIwoulddoareasfollows:

Retrythequerybyappendingdebugparameters(forexample,debugQueryandexplainOther)andanalyzetheexplainsection.There’sawonderfulonlinetool(http://explain.solr.pl)thatmakeslifeeasybyexplainingdebuginformation.Usethefieldanalysispage,typesomesamplevalues,andseewhathappensatindexandquerytime.Probably,youranalyzerchainsarenotconsistent.

www.it-ebooks.info

Page 180: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MismatchbetweenindexandqueryanalyzerUsingdifferentanalyzerchainsatindexandquerytimesometimescausesproblemsbecausetokensproducedatquerytimedon’tmatch,asonewouldexpect,withtheoutputtokensatindextime.Thefieldanalysispagehelpsalotindebuggingthesesituations.Typeavalueforafieldandseewhathappensatqueryandindextime.Inaddition,thispageprovidesacheckforallhighlightingmatchesbetweenindexandquerytokens.

www.it-ebooks.info

Page 181: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

NoscoreisreturnedinresponseThescorefieldisavirtualfieldthatmustbeexplicitlyaskedforinrequests.Avalueof*intheflparameterisnotenoughbecause*means“allrealfields.”Arequestforallrealfieldsthatalsoincludethescoremustprovideanflparameterwiththevalueof*,score.Notethatthisisvalidingeneralforallvirtualfields(forexample,functions,transformers,andsoon).

www.it-ebooks.info

Page 182: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 183: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthischapterwemettheSolrsearchcapabilities,ahugesetoffeaturesthatpowerupinformationretrievalonSolr.Wesawalotoftoolsusedtoimprovethesearchexperienceofclients,requestors,andlastbutnotleast,endusers.Afterexaminingtheindexingphase,youcanwellimaginethatsearchandinformationretrievalconstitutetheactualfunctionalgoalsofafull-textsearchplatform.

WemetthedifferentpiecesthatcompoundSolr’ssearchcapabilities:analyzers,tokenizers,queryparsers,searchcomponents,andoutputwriters.Forallofthem,Solrprovidesagoodsetofalternatives,alreadyimplementedandreadytouse.Forthosewhohavespecificrequirements,itisalwayspossibletocreatecustomizationsandextensions.

Inthenextchapter,keepinginmindthebigpictureofcrucialphasesinaninformationretrievalsystem,wewilltakealookatclientAPIs.TheavailablelibrariesaregreatexamplesofhowtouseSolr’sHTTPservicestoworkprogrammaticallywithitontheclientside.

www.it-ebooks.info

Page 184: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 185: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter4.ClientAPIAsearchapplicationneedstointeractwithSolrbyissuingindexandsearchrequests.AlthoughSolrexposestheseservicesthroughHTTP,workingatthat(low)levelisnotsoeasyforadeveloper.ClientAPIsarefaçadelibrariesthathidethelow-leveldetailsofclient-servercommunication.TheyallowustointeractwithSolrusingclient-nativeconstructsandstructuressuchastheso-calledPlainOldJavaObject(POJO)intheJavaprogramminglanguage.

InthischapterwewilldescribeSolrj,theofficialSolrclientJavalibrary.Wewillalsodescribethestructureandthemainclassesinvolvedinindexandsearchoperations.Thechapterwillcoverthefollowingtopics:

Solrj:theofficialJavaclientlibraryOtheravailablebindings

www.it-ebooks.info

Page 186: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SolrjSolrjisthenameoftheofficialSolrJavaclient.Itcompletelyabstractstheunderlying(HTTP)transportlayerandoffersasimpleinterfacetoclientapplicationstointeractwithSolr.

www.it-ebooks.info

Page 187: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SolrServer–theSolrfaçadeAclientlibrarynecessarilyneedsafaçadeoraproxy,thatis,anobjectrepresentingtheremoteresourcethathidesandabstractsthelow-leveldetailsofclient-serverinteraction.InSolrj,thisroleisplayedbyclassesthatimplementtheorg.apache.solr.client.solrj.SolrServerabstractclass.Atthetimeofwritingthisbook,thesearetheavailableSolrServerimplementers:

EmbeddedSolrServer:ThisconnectstoalocalSolrCorewithoutrequiringanHTTPconnection.Thisisnotrecommendedinproductionbutisdefinitelyusefulforunittestsanddevelopment.HttpSolrServer:ThisisaproxythatconnectstoaremoteSolrusinganHTTPconnection.LBHttpSolrServer:AproxythatwrapsmultipleHttpSolrServerinstancesandimplementsclient-side,round-robinloadbalancingbetweenthem.Italsoensuresitperiodicallychecksthe(running)stateofeachserver,eventuallyremovingoraddingmemberstotheround-robinlist.ConcurrentUpdateSolrServer:Thisisaproxythatusesanasynchronousqueuetobufferinputdata(thatis,documents).Onceagivenbufferthresholdisreached,dataissenttoSolrusingaconfigurablenumberofdequeuerthreads.CloudSolrServer:AproxyusedtocommunicatewithSolrCloud.

AlthoughanySolrServerimplementersmentionedpreviouslyofferthesamefunctionalities,HttpSolrServerandLBHttpSolrServerarebettersuitedforissuingqueries,whileConcurrentUpdateSolrServerisrecommendedforupdaterequests.

TipThetestcase,org.gazzax.labs.solr.ase.ch3.index.SolrServersITCase,containsseveralmethodsthatdemonstratehowtoindexdatausingdifferenttypesofservers.

www.it-ebooks.info

Page 188: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

InputandoutputdatatransferobjectsAsdescribedinthepreviouschapters,aDocumentisacentralconceptinSolr.Itrepresentsanatomicunitofinformationexchangedbetweentheclientandtheserver.TheSolrAPIseparatesinputdocumentsfromoutputdocumentsusingtheSolrInputDocumentandSolrDocumentclasses,respectively.

Althoughtheysharebasicdatatransferobjectbehavior,eachofthemhasitsownspecificfeaturesassociatedwiththedirectionofinteractionbetweentheclientandtheserverwheretheyaresupposedtoplay.

SolrInputDocumentisawriteobject.Youcanadd,change,andremovefieldsinit.Youcanalsosetaname,value,andoptionalboostforeachofthem:

publicvoidaddField(Stringname,Objectvalue)

publicvoidaddField(Stringname,Objectvalue,floatboost)

publicvoidsetField(Stringname,Objectvalue)

publicvoidsetField(Stringname,Objectvalue,floatboost)

SolrDocumentistheoutputdatatransferobject,anditisprimarilyintendedasaqueryresultholder.Here,youcangetfieldvalues,fieldnames,andsoon:

publicObjectgetFieldValue(Stringname)

publicCollection<Object>getFieldValues(Stringname)

publicObjectgetFirstValue(Stringname)

WithinanUpdateRequestProcessorinstance,orwhileaddingdatatoSolr,wewilluseSolrInputDocumentinstances.InQueryResponse(thatis,theresultofaqueryexecution),wewillfindSolrDocumentinstances.

TipAlltheexamplesinthesampleprojectassociatedwiththischaptermakeextensiveuseofthesedatatransferobjects.

www.it-ebooks.info

Page 189: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

AddsanddeletesOnceavalidreferenceofaSolrServerhasbeencreated,addingdatatoSolrisveryeasy.TheSolrServerinterfacedefinesseveralmethodstodothis:

voidadd(SolrInputDocumentdocument)

voidadd(List<SolrInputDocument>document)

SowefirstcreateoneormoreSolrInputDocumentinstancesfilledwiththeappropriatedata:

finalSolrInputDocumentdoc1=newSolrInputDocument();

doc1.setField("id",1234);

doc1.setField("title","DelicateSoundofThunder");

doc1.addField("genre","Rock");

doc1.addField("genre","ProgressiveRock");

Then,usingtheproxyinstance,wecanaddthatdata:

solrServer.add(doc1);

Finally,wecancommit:

solrServer.commit();

Wecanalsoaccumulateallthedocumentswithinalistandusethatastheargumentoftheaddmethod.

FollowingthesamelogicasdescribedinthesecondchapterforRESTservices,SolrServerprovidesthefollowingmethodstodeletedocuments:

UpdateResponsedeleteById(Stringid)

UpdateResponsedeleteById(Stringid,intcommitWithinMs)

UpdateResponsedeleteById(List<String>ids)

UpdateResponsedeleteById(List<String>ids,intcommitWithinMs)

UpdateResponsedeleteByQuery(Stringquery)

UpdateResponsedeleteByQuery(Stringquery,intcommitWithinMs)

TipTheorg.gazzax.labs.solr.ase.ch3.index.SolrServersITCasetestcasecontainsseveralmethodsthatillustratehowtoindexanddeletedata.

www.it-ebooks.info

Page 190: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SearchSearchingwithSolrjrequiresknowledgeof(mainly)twoclasses:org.apache.solr.client.solrj.SolrQueryandorg.apache.solr.client.solrj.response.QueryResponse.ThefirstisanobjectrepresentationofaquerythatcanbesenttoSolr.Itallowsustoinjectallparameterswedescribedinthepreviouschapter.Onewayofdoingthisisbyprovidingdedicatedmethods,suchasthese:

SolrQuerysetQuery(Stringquery)

SolrQuerysetRequestHandler(Stringqt)

SolrQueryaddSort(Stringfield,ORDERorder)

SolrQuerysetStart(Integerstart)

SolrQuerysetFacet(booleanb)

SolrQueryaddFacetField(String…fields)

SolrQuerysetHighlight(booleanb)

SolrQuerysetHighlightSnippets(intnum)

Alternatively,genericsettermethodscanbeprovided:

SolrQuerysetParam(Stringname,String…values)

SolrQuerysetParam(Stringname,booleanvalue)

NotethatalltheprecedingmethodsreturnthesameSolrQueryobject,thusallowingacallertochainmethodcalls,likethis:

SolrQueryquery=newSolrQuery()

.setQuery("CharlesMingus")

.setFacet(true)

.addFacetField("genre")

.addSort("title",Order.ASC)

.addSort("released",Order.DESC)

.setHighlighting(true);

OnceaSolrQueryhasbeenbuilt,wecanusetheappropriatemethodintheSolrServerproxytosendthequeryrequest:

QueryResponsequery(SolrParamsparams)

ThemethodreturnsaQueryResponse,whichisanobjectrepresentationoftheresponsethatSolrsentbackasaresultofthequeryexecution.Withthatobject,wecangetthelistofSolrDocumentsofthecurrentlyreturnedpage.Wecanalsogetfacetsandtheirvalues,andingeneral,wecaninspectandaccessanypartoftheresponse.

TipTheorg.gazzax.labs.solr.ase.ch3.search.SearchITCasetestcasecontainsseveralexamplesthatdemonstratehowtoquerywithSolrj.

ThefollowingisanexampleoftheuseofQueryResponse:

//Executesaqueryandgetthecorrespondingresponse

QueryResponseres=solrServer.query(aQuery);

www.it-ebooks.info

Page 191: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

//Getstherequestexecutionelapsedtime

longelapsedTime=res.getElapsedTime();

//Getstheresults(i.e.apageofresults)

SolrDocumentListresults=res.getResults();

//Howmanytotalhitsforthisresponse

inttotalHits=results.getNumFound();

//Iteratesoverthecurrentpage

for(SolrDocumentdocument:results){

//Dosomethingwiththecurrentdocument

Stringtitle=document.getFieldValue("title");

}

//Getsthefacetfield"genre"

FacetFieldff=res.getFacetField("genre");

//Iterateoverthefacetvalues

for(Countcount:genre.getValues()){

Stringname=count.getName();//e.g.Jazz

Stringcount=count.getCount();//e.g.19

}

//TheHighlightingsectionisabitcomplicated,asthe

//valueobjectisacompositemapwherekeysarethedocumentsidentifiers

whilevaluesaremapswithhighlightedfieldsaskeyandsnippets(alist

ofsnippets)asvalues.

Map<String,Map<String,List<String>>>hl=

response.getHighlighting();

//Iteratesoverhighlightingsectio

for(Entry<String,Map<String,List<String>>docEntry:hl){

StringdocId=docEntry.getKey();

//Iteratesoverhighlightedfields

for(Entry<String,List<String>fEntry:entry.getValue()){

StringfEntry=field.getKey();

//Iteratesoversnippets

for(Stringsnippet:field.getValue()){

//Dosomethingwiththesnippet

}

}

www.it-ebooks.info

Page 192: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 193: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OtherbindingsSolrjisaverypowerfulclientAPI,butofcourse,itisonlyavailableforJavaclients.SinceSolrservicesareexposedusingstandardHTTPprocedures,otherclientAPIimplementationshavebeencreatedforotherlanguages.Hence,itispossibletointeractwithSolrusingPython,Perl,Ruby,.NET,oryourfavoriteprogramminglanguage.

Thefollowingtablelistssomeofthem,togetherwiththeirlocation(onlySolrjisapartoftheSolrdistribution;allotherclientlibrariesareindependentprojects):

Project Language Address

sunburnt Python https://pypi.python.org/pypi/sunburnt

pysolr Python https://pypi.python.org/pypi/pysolr/3.2.0

solrcloudpy Python https://pypi.python.org/pypi/solrcloudpy

solr-ruby Ruby https://github.com/erikhatcher/solr-ruby-flare/tree/master/solr-ruby

Blacklight Ruby http://projectblacklight.org

Solarium PHP http://www.solarium-project.org/

Solr-PHP-UI PHP http://www.opensemanticsearch.org/solr-php-ui/

PECL/Solr PHP http://pecl.php.net/package/solr

Flux Clojure https://github.com/mwmitchell/flux

solr-scala-client Scala https://github.com/takezoe/solr-scala-client

SolrNet .NET https://github.com/mausch/SolrNet

Acompleteandupdatedlistofallbindingsisavailableathttps://wiki.apache.org/solr/IntegratingSolr.

www.it-ebooks.info

Page 194: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 195: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryAdistributedsearchsystem,suchasSolr,requiresremoteserviceinvocationstosendandreceivedataacrossanetwork.ClientswithoutappropriateAPIswillbeexposedtothecomplexityofdealingwithlow-leveldetailsofthecommunicationprotocol.

SinceSolrprovidesallcoreservicesthroughHTTP,alotofclientlibrarieshavebeendevelopedtohidethatcomplexity.Regardlessoftheconcretebinding,aclientlibraryencapsulatesthelow-leveldetailsofclient-servercommunicationandprovidesauniformserviceinterfaceforclients.

Inthischapter,wefocusedontheSolrclientAPIs,specificallyontheofficialJavabindingcalledSolrj,itsmainfeatures,andthemainclassesinvolvedinindexandqueryoperations.

WebrieflydescribedandlistedsomeotherpopularbindingsthathavebeendevelopedontopoftheSolrHTTPservices.

Inthenextchapter,wewillreturntotheserversidetodescribehowtofine-tuneandmanageaSolrinstance.

www.it-ebooks.info

Page 196: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 197: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter5.AdministeringandTuningSolrYoucanmanageaSolrinstallationusinganyoftheseveralsystemadministrationtoolsprovidedwithSolr.ThesystemadministrationtoolsincludetheAdministrationConsole,theRESTservices,andtheJMXAPI,withwhichyoumanageandmonitorcores,hardwareresources,runtimeconfiguration,andthehealthoftheSolrenvironmenttoensuremaximumavailabilityandperformance.

Althoughthetopicofadministrationisusuallyoutsidethescopeofadevelopersphere,mostprobablyyou,asaproviderofasolutionbasedonSolr,willneedtoknowsomethingaboutit.Specifically,youneedtoknowaboutasetoftoolsthatletyoumonitorSolr,tuneit,andinvestigatetroubles.

Throughoutthischapter,wewilluseaSolrinstancepreloadedwithsampledata.Inordertohavethatupandrunning,youshouldcheckoutthesourcecodeofthebook,gotothech5folder,andrunthis(usingEclipseorfromthecommand-line):

#mvncleaninstallcargo:run

TipThech5sampleprojecthasapreconfiguredEclipselauncherusedtorunSolr.Youcanfinditunderthesrc/dev/eclipsefolder.Justright-clickonstart-ch5-server.launchandselecttheDebugasmenuitem.

ThischapterwilldescribethemostrelevantsectionsoftheSolradministrationconsole.WewillalsoexploretheJMXAPI.Eachtimeahardwareresourceisinvolved,wewilltalkaboutit.Specifically,thischapterwillcoverthefollowingtopics:

TheSolrAdministrationConsoleUsageofhardwareresourcesJConsoleandJMX

www.it-ebooks.info

Page 198: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DashboardTheAdministrationConsoleisawebapplicationthatispartofSolr.YoucanaccesstheAdministrationConsolefromanymachineonthelocalnetworkthatcancommunicatewithSolr,throughawebbrowser.

Typehttp://127.0.0.1:8983/solronthewebbrowser’saddressbar.Thefirstpagethatappearsisthedashboard,asshowninthefollowingscreenshot:

ThisiswhereyoucanseegeneralinformationaboutSolr(forexample,theversion,startuptime,andsoon)andaboutitshostingenvironment(forexample,JVMversion,JVMargs,processors,physicalandJVMmemory,andfiledescriptors).

www.it-ebooks.info

Page 199: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

PhysicalandJVMmemoryThefirstandthelastgraybarsontherightsideofthedashboardrepresentthephysicalandJVMmemory,respectively.Thefirstmeasureistheamountofthememorythatisavailableinthehostingmachine.ThesecondmeasureistheamountassignedtotheJVMatstartuptimebymeansofthe–Xmsand–Xmxoptions.

TipForacompletelistofavailableJVMoptions,seehttps://docs.oracle.com/cd/E22289_01/html/821-1274/configuring-the-default-jvm-and-java-arguments.html.

Eachbarreportsboththeavailableamountandusedamountofmemory.Asyoucanimagine,memoryisoneofthecrucialfactorsconcerningSolrperformanceandresponsetimes.

Whenwethinkaboutawebapplication,wemayconsideritasastandalonecontainerthat,forexample,readsdatafromanexternaldatabaseandshowssomedynamicpagestotheendusers.Solrisnotlikethat;itisaservice.Despiteitsweb-application-likenature,itmakesextensiveuseoflocalhardwareresourcessuchasdiskandmemory.

Memory(here,I’mreferringtotheJVMmemory)isusedbySolrforalotofthings(forexample,caches,sorting,faceting,andindexing)sounderstandingallthosemechanismsiscrucialtodeterminetherightamountofmemoryoneshouldassigntotheJVM.

NoteThere’sausefulspreadsheet(althoughwealreadymentionedthisinthefirstchapter)thatyoucanfindintheSolrsourcerepositoryathttps://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls.ItisagoodstartingpointfromwhichtoestimateRAManddiskspacerequirements.

However,aresourcethatisoftenconsideredasexternaltotheSolrdomainisthesystemmemory,thatis,theremainingmemoryavailablefortheoperatingsystemoncetheJVMmemoryhasbeendeducted.

Inanoptimalsituation,thatkindofmemoryshouldbeenoughto:

Lettheoperatingsystemmanageitsresources.AccommodatetheSolrindex.Ideally,ifitisabletocontainthewholeindex,therewon’tbeanydiskseek.

Thefirstpointisquiteobvious;anoperatingsystemneedsagivenamountofmemorytomanageitsordinarytasks.

Thesecondpointhastodowiththeso-called(OS)filesystemcache.TheJVMworksdirectlywiththememorythatwemadeavailableinthestartupcommand-linebymeansofthe–Xmsand–Xmxoptions.ThisisthememoryweareusinginourJavaapplicationtoloadobjectinstances,implementapplication-levelcaches,andsoon.

www.it-ebooks.info

Page 200: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

However,applicationssuchasSolrthatwidelyusefilesystemresources(toloadandwriteindexfiles)alsorelyonanotherimportantpartofthememorythatisavailablefortheoperatingsystemandisusedtocachefiles.Onceafileisloaded,itscontentiskeptinmemoryuntilthesystemrequiresthatspaceforotherpurposes.Datainthisfilesystemcacheprovidesquickaccess,withoutrequiringdiskaccessesandseeks.

NoteRememberthatthistypeofmemoryhasnothingtodowiththememoryassignedtotheJVM.

Asyoucanimagine,thisaspectcandramaticallyimproveoverallperformanceinbothindex(writes)andquery(reads)phases.Inthosecaseswhereit’snotpossibletofitalloftheindexinthefilesystemcache(theindexcaneasilyreachasizethatisrelativelysmallintermsofdiskspacebutdefinitelyhugeintermsofmemory),thesystemmemoryshouldbeenoughtoallowefficientloadandunloadmanagementofthatfilesystemcache.

www.it-ebooks.info

Page 201: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DiskusageThedashboardpagereportsinformationabouttheswapspace,butitsaysnothingaboutdiskusage.Thisisbecausethatkindofinformationisreportedinadedicatedsectionforeverymanagedcore.Unfortunately,thereisn’tacentralpointwhereit’spossibletoseethetotaldiskspaceusedbytheinstance.

Asdescribedintheprevioussection,thediskisaresourcewidelyusedbySolr,anditsroleisfundamentalforgettingoptimalperformance.Here,wecanaddadditionalinformationbymentioningSolidStateDisks(SSD),whichareusuallyaverygoodchoiceforgettingfastreadsandwrites.Butagain,themostcriticalfactorisunderstandingandtuningthefilesystemcache;inthemostextremecases,thisentirelyavoidsdiskseeksatall.ToputitinanutshellSSDsarefast,butmemoryisbetter.

www.it-ebooks.info

Page 202: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

FiledescriptorsThethirdbar(showninthepreviousscreenshot)showsthemaximumnumber(lightgray)andtheeffectiveopened(darkgray)filedescriptorsassociatedwiththeJavaprocessthatrunsSolr(thatis,theJavaprocessofyourservletcontainer).

ASolrindexcanbecomposedofalotoffilesthatneedtobeopenedatleastonce.Especiallyifyouhavemanycores,frequentchanges,commits,andoptimizes,theincrementalnatureofaSolrindexcanleadtoexhaustionofalltheavailablefiledescriptors.ThisisusuallythecasewhereyougetanIOException(toomanyopenfiles).

ThefirstplacewhereyoucanmanageandlimitthenumberoffilesusedbySolrisSolritself.Withinthesolrconfig.xmlfile,you’llfinda<mergeFactor>parameterinthe<indexConfig>section.Thisparameterdecideshowmanysegmentswillbemergedatatime.

TheSolr/Luceneindexiscomposedofmultiplesubindexescalledsegments.Eachsegmentisanindependentindexcomposedofseveralfiles.Whendocumentsareadded,updated,ordeleted,Solrasynchronouslypersiststhosechangesbycreatingnewsegmentsormergingexistingsegments.Thisisthereasonthetotalnumberoffilescompoundingtheindexwillnecessarilychange(itchangesgradually,followingareasonableamountofchangesappliedtoyourdataset).Hence,itneedstobemonitored.

WithamergeFactorvaluesetto10(thedefaultvalue)therewillbenomorethanninesegmentsatagivenmoment.Whenupdatethresholds(themaxBufferedDocsorramBufferSizeparameters)arereached,anewsegmentwillbecreated.IfthetotalnumberofsegmentsisequaltotheconfiguredmergeFactor,Solrwillattempttomergeallexistingsegmentsintoanewsegment.

Anotherparameterinthesolrconfig.xmlfilethathasanimpactonthenumberofopenfilesis<useCompoundFile>.Ifthisissettotrue(notethatitdefaultstofalse),Solrwillcombinethefilesthatmakeupasegmentintoasinglefile.Whilethatmayproduceabenefitintermsofopenfiledescriptors,itmayalsoleadtosomeperformanceissuesbecauseofthemonolithicnatureofthecompoundfile.

Ontopofthat,therearescenarioswherealotoffilesarethenaturalconsequenceofyourinfrastructure.Thinkofasystemwithseveralcores,forexample.Theprevioussettingsarespecifictoasinglecore,butwhatifyouhavealotofthem?

TipWhenIuseSolrforlibrarysearchservices,Iusuallycreateatleastsixcores:oneforthemainindex,onethatholdstheheadingsusedfortheautocompletionfeature,andoneforeachalphabeticalindex(forexample,authors,titles,subjects,andpublishers).Therearesomecustomerswhorequireupto50alphabeticalindexes(whichmeansupto50cores).

Insuchcases,aftercheckingoutyourapplicationandseeingthatiteffectivelyrequiresmorefiledescriptorsthanthedefault(usually1024),youmaywanttoincreasethatlimitbyusingtheulimitcommand,asfollows:

www.it-ebooks.info

Page 203: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

#ulimit–n5000

Here,5000isthenewlimit.Notethatthiscommandrequiresrootprivilegesanditappliesthatlimitonlytothecurrentsession.Ifyouwantittobepermanent,thatvaluehastobeconfiguredinthe/etc/security/limits.confconfigurationfile.

www.it-ebooks.info

Page 204: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 205: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

LoggingTheAdministrationConsoleallowsyoutoseelogmessages(alsoavailableinalogfile)andchangethelogsettings.

Whilethefirstfeatureisusefulonlyifyoudon’thaveaccesstothelogfiles(inspectinglogfileswithUnixcommand-linetoolsisdefinitelymorepowerfulthandoingthesamewiththeAJAX-refreshedpage),managinglogsettingsisveryusefulbecauseitdoesn’trequiremanualeditsorserverrestarts.So,ifyouwanttolimitthepriorityleveloflogmessageson-the-fly,ordebugthebehaviorofacomponent,thisistherightplacetodoso.

TipAverboseloglevelcanslowdownindexoperations,soit’sbettertochecklogsettingsbeforecallingthe/updaterequesthandler.Forthesamereason,rememberthatSolrlogsallqueryrequestsattheINFOlevel.Dependingonhowmanyusersyourapplicationhas,thiscouldleadtoahugeamountoflogmessages.

www.it-ebooks.info

Page 206: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 207: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CoreAdminTheCoreAdminsectionisacentralpointwhereyoucanmanageregisteredcores.Youcancreateanewcoreon-the-fly(assumingthatthecoreinstanceanddatadirectoriesexistonthedisk)ormanagetheexistingcoresonebyone,selectingthemfromthelistontheleft.ThefollowingscreenshotshowstheCoreAdminpageoftheSolrinstancesetupforthischapter:

Thetoptoolbarcontainsthesebuttons:

Button Description

Unload Unloadsthecore.Thecorewillberemovedafterpendingrequestsareprocessed.

Rename Changesthecorename.NotethatthischangewillaffecttheURIendpointsofthecoreservices.

Swap Swapstwoactivecores.Thisisusefulforswitchingbetweentwoversions(thatis,onlineandofflineversions)ofthesamecore.Notethatbothofthemwillstillbealiveafterissuingtheswapcommand.

ReloadReloadsacore.Thecurrentcoreinstancewillbeavailableonlyforsatisfyingpendingrequests.Thiscommandisusefulifsome(backward-compatible)changeshavebeenmadetothesolrconfig.xmlorschema.xmlconfigurationfilesorcorelibrariesandyouwanttoloadthosechanges.

www.it-ebooks.info

Page 208: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Optimize Issuesanoptimizecommandtotheselectedcore.

Thecentralareashowsthefollowinginformationaboutthecoreandthecorrespondingindex:

Attribute Description

startTime Thecorestart(orreload)time.

instanceDir Thetopcorefolder.ItcontainsaconfsubfolderthatcontainsSolrconfigurationfiles(schema.xml,solrconfig.xml,anddependentfiles).

dataDir Thefoldercontainingtheindexdatafiles.

lastModified Thelastmodificationdateoftheindex.

version AversionnumberassignedtotheIndexReaderinstanceassociatedwiththeindex.

numDocs Thenumberofsearchabledocumentsintheindex.Inotherwords,thisisthenumberofdocumentsyoucangetbackfroma*:*query.

maxDocsThenumberofinternaldocumentidentifiersactuallyinuse.ThedifferencebetweenmaxDocsandnumDocsindicateshowmanydocumentshavebeendeletedorreplaced.Theold(deletedandreplaced)identifiersaregraduallyremovedduringmergesorafterissuinganindexoptimize.

deletedDocsThenumberofdeleteddocuments.ItalsoincludesreplaceddocumentsbecauseSolrdoesn’tactuallysupportupdates;itsimplydeletesagivendocumentandsubsequentlyaddsitsnewversion.ThisisbasicallythedifferencebetweenmaxDocsandnumDocsafteracommitandbeforemergingoroptimizing.

optimized Indicateswhethertheindexhasbeenoptimized.

current Indicateswhethertheindexhasbeencommitted.

directory TheunderlyingLuceneDirectoryimplementation.

www.it-ebooks.info

Page 209: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 210: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

JavapropertiesandthreaddumpJavapropertiesformaread-onlysectionwhereyoucanseethesystempropertiesassociatedwiththecurrentJVMinstance.

TipRememberthatyoucanusethosevariablesinsolrconfig.xml,soyoumaywanttocheckinthispagewhetheraspecificpropertyhastheexpectedvalue.

ThethreaddumppageshowsasnapshotofwhatlivethreadsintheJVMaredoingatagiveninstant.Thesameinformationcanberetrievedusingthejstackcommand-lineutilityavailableinJVM.

TipThreaddumpsareveryusefulfordebugginghigh-CPU-usagescenariosanddeadlocks.

Unlikeloganalysis,theuserinterfacehereisdefinitelymoreuser-friendlythanmanualinspectionofthejstackoutput.

www.it-ebooks.info

Page 211: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 212: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CoreoverviewSelectingoneoftheavailablecoresinthedrop-downlistontheleftsideoftheAdministrationConsolewillopenacorededicatedarea,withseveralothersections.Thefirstsectionisanoverviewoftheselectedcore.Itreportsmoreorlessthesameinformationthatwesawinthedashboardandinthecoreadminpage.

Here,thereisadditionalinformationaboutthehealthcheck(heartbeatinformationenabledonlyifyouconfiguredthepingrequesthandler)andthereplicationstatus.

Thereplicationsectionshowstheindexstatusofthemasterandslave(onlyifthecurrentSolrinstanceactsasaslave)intermsofreplicability.

TipThereplicationsectionisusefulformonitoringmaster-repeater-slaveinstances,especiallywhenyougetsomesynchronizationissueswithintheSolrensemble.NotethattheconsolealsohasadedicatedReplicationsectionwherethatinformationismoredetailed.

Themaster-slavereplicationarchitectureisexplainedinthenextchapter.

www.it-ebooks.info

Page 213: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CachesTospeedupqueryexecution,Solrstoresdatausingseveraltypesofin-memorycaches.Cachestransparentlystorefilters,documents,andidentifierssothatfuturerequestsforthesamedatacanbeservedfaster.Ifyourunthesamesearchtwice,youwillseeintheSolrlogsamarkeddifferencebetweenthefirstandthesecondqueryintermsofresponsetime,asshowninthefollowingexample:

…params={q=history&fq=catalog:NRA}hits=17298status=0QTime=78

…params={q=history&fq=catalog:NRA}hits=17298status=0QTime=2

Solrcomeswithseveralkindsofcaches.Theycanbeconfiguredandtunedinsolrconfig.xml:

<filterCacheclass="solr.FastLRUCache"size="512"initialSize="512"

autowarmCount="0"/>

<queryResultCacheclass="solr.LRUCache"size="512"initialSize="512"

autowarmCount="0"/>

<documentCacheclass="solr.LRUCache"size="512"initialSize="512"

autowarmCount="0"/>

<fieldValueCacheclass="solr.FastLRUCache"size="512"autowarmCount="128"

showItems="32"/>

ThefollowingtablebrieflydescribesthetypesofcachesavailableinSolr:

Cache Description

FilterCache Holdsthedocumentidentifiersassociatedwithfilterqueriesthathavebeenexecuted.

QueryResultCache Holdsthedocumentidentifiersresultingfromqueriesthathavebeenexecuted.

DocumentCache HoldsLucenedocumentinstancesforquickaccesstotheirstoredfields.

FieldCacheAlow-levelLucenefieldcachethatisnotmanagedbySolr(inotherwords,itcannotbeconfigured).Itisusedforsortingandfaceting.

FieldValueCacheThisisafieldcacheverysimilartoFieldCache,butitcanbeconfigured.Itismainlyusedforfaceting.

CustomCache Application-levelcachesusedtoholdcustomuser/applicationdata.

www.it-ebooks.info

Page 214: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 215: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CachelifecyclesAcacheisalwaysassociatedwithanindexsearcherinstance,anditfollowsthesamelifecycleofthatinstance.Thismeansthat,whenanindexsearcherisinstantiated(onstartuporafteracommit),cacheinstancesarecreatedandassociatedwithit.Asaconsequenceofthis,cachesandcachedobjectsdon’thaveanexpirytime;theywillbevalidaslongastheowningindexsearcherinstanceisactive.

Whenasearcherisinstantiated,andifitisnotthefirstsearcher(thatis,atstartuptime),cachescanbeoptionallyauto-warmed;thatis,theycanbeprepopulatedwithsomedatacomingfromtheirpreviouscolleagues(cachesfromtheprevioussearcher).Theautowarmcountattributeallowsustodeclarethemaximumamountofdata(absoluteorapercentage)thatcanbeusedtoprepopulatethenewcache.

NoteDatafromthepreviouscacheisnottakenasitis.Ithastobevalidatedagainstthenewsearcher“view”oftheindex.Agivenobjectpreviouslycachedcan’tbevalidafterthenewsearcherhasbeenopened;itcouldhavebeendeleted.Theautowarmcountattributerefersonlytovalidentries.

Whenanewsearcherisopened,thecurrentsearcherwillcontinuetoservependingrequests.Afterthat,itwillbeclosedandtheorphancacheswillbesubjectedtogarbagecollection.

www.it-ebooks.info

Page 216: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CachesizingCachesizecanrefertotwodifferentmeasures:thetotalcountofobjectsacachecontainsataspecificmoment,andthemaximumnumberofobjectsacachecanhold.

Withinsolrconfig.xml,youcanconfiguretheminimum(initial)andmaximumsizeofacachebymeansoftheinitialSizeandsizeattributes,respectively:

<FilterCache…class="…"size="512"initialSize="512"/>

TheinitialSizeattributeisusedwhenthecacheinstanceiscreated.Itpreallocatesagivennumberofseatsforobjectsthatwillbecached.

Theidealdimensionofacachestrictlydependsontheapplication.Erroneously,onecouldthink:thebigger,thebetter,butthisisahalftruth;ahugecachewouldhavetheadvantageofholdingalltherequiredstructuresinmemory,thusallowingfastaccesstothatinformation.However,unlessyourindexiscompletelystaticanditneverchanges,youwillsoonerorlateradd,update,orremovesomething,andyouwillneedtocommitthosechanges.Acommitwillopenanewsearcher,whichinturnwillcreatenewcaches,andthe(old)hugecacheswillbediscarded.

Inthissituation,thegarbagecollectorwillhavealotofworktodoreclaimingallobjectsfromtheoldcaches.Worse,ifyouhaveconfiguredauto-warming,theprepopulationofthenewlycreatedcachescouldtakealotoftime.

Inotherwords,thisscenariorequiresalotofmemorytomanageallofthoseobjects.Frommyexperience,Icantellyouthatthisisoneofthecommonwaysofgetting“OutOfMemory”errormessages.Rememberthatgarbagecollectionisnotunderyourcontrol,somostprobablytherewillbeagivenintervaloftimeduringwhichtheJVMmustholdbothnewandoldobjectreferences.

Thesuggestionhereistostartwithdefaultsizes,andthenusetheSolrAdministrationConsoletoconstantlymonitorhowthingsmove.Cachemanagementisnotado-once-and-forgettask.Cachesmustbeperiodicallymonitoredandeventuallytunedinordertogainoptimaladvantageforyourapplication.

www.it-ebooks.info

Page 217: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CachedobjectlifecycleTheclassattributeofacachedeterminesprimarilyitsimplementation,butmostimportantly,itdefineshowobjectsaremanagedwithinthecache.Inotherwords,itimplementsthelogicneededtoknowwhattodowhenthecachereachesitsmaximumsizeandwhichobjectsmustbeevictedwhenanewentryarrives.

Solroffersthreecacheimplementations:

LRUCache:Oncethemaximumsizeofthecachehasbeenreachedandanewobjectneedstobecached,thisimplementationwillremovetheoldestentry.Theageofanobjectisdeterminedbythelasttimeitwasrequestedfromthecache.FastLRUCache:ThisimplementsbehaviorsimilartoLRUCachebutusesaseparatethreadto(asynchronously)cleanuptheoldestentries.LFUCache:Thispolicyimplementsanevictionbasedonthepopularityofeachobjectinthecache(thatis,howmanytimesagivenobjectinthecachehasbeenrequested).

www.it-ebooks.info

Page 218: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CachestatsForeachcache,theAdministrationConsolereports(Plugin/Stats|Cache)thefollowingattributes:

Attribute Description

lookups Thetotalcountoflookuprequests.

hits Thenumberofrequeststhatsuccessfullyfoundtherequestedobject.

hitratioThenumberofhitsontopofthetotalnumberofrequests.Avalueof1representsoptimalusageofthecache(everyrequestedobjecthasbeenfoundinthecache).

inserts Thetotalnumberofinsertedobjects.

evictions Thetotalnumberofevictions(objectsremoved).

size Thecurrentsizeofthecache.

warmupTime Thetimeneededtoauto-warmthecache.

cumulative_lookups

cumulative_hits

cumulative_hitratio

cumulative_inserts

cumulative_evictions

Acacheinstancedieswhentheassociatedsearcherisdiscarded.Thecumulativeattributesretainlookups,hits,hitratio,inserts,andevictionsamongallcacheinstances(ofthesametype),sothevalueofthoseattributesmeasuresthesamethingswejustsawbutcumulatively,sinceSolrstartup.

www.it-ebooks.info

Page 219: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TypesofcacheAswehavebrieflydescribed,Solrcomeswithseveralkindsofcaches.Thefollowingparagraphsdescribethemfurther.

FiltercacheEachtimeafilterqueryisexecuted,Solrplacesanewentryinafiltercache.Afiltercacheisakindofmapwherethekeyisrepresentedbythefilterquerystring(forexample,catalog:NRAorgenre:Jazz)andtheentryisalistofallmatchingdocumentidentifiers.

Thefiltercacheisconfiguredinthesolrconfig.xmlfile,inthefollowingfragment:

<filterCacheclass="solr.FastLRUCache"size="512"initialSize="512"

autowarmCount="0"/>

Filterqueriesplayacrucialroleinperformanceandresponsetimeoptimization.Thecachedidentifierscanbeusedandreusedwithsubsequentqueries;briefly,requeststhatcontaincachedfilterquerieswillimproveoverallperformancebecausethosequerieswon’tbeactuallyexecutedagain.

Auto-warmingafiltercachemeansrefreshingeverycachedfilterqueryresultbyexecuting(again)allofthosequeriesagainsttheindexviewrepresentedbythenewsearcher.Let’sseethiswithaconcreteexample;thesampleSolrinstancecontains24albums.Atstartuptime,thefiltercacheisempty.Nowlet’ssupposethefollowingqueriesareexecuted:

http://127.0.0.1:8983/solr/example/query?q=*:*&fq=genre:Jazz(3results)

http://127.0.0.1:8983/solr/example/query?q=*:*&fq=genre:Fusion(4results)

http://127.0.0.1:8983/solr/example/query?q=*:*&fq=released:1986(2results)

Thethreefilterqueriespopulatethefiltercacheasdescribedinthefollowingtable:

Cacheentries(filterqueries) Queryresults(Documentidentifiers)

genre:Jazz 1,2,3

genre:Fusion 4,5,6,7

released:1986 6,8

Nowwedecidetoremovedocument#6.Inordertodothis,wesendadeletecommandandthenacommitcommand.Oncethechangehasbeencommitted,document#6nolongerexists.Anewsearcherisopened,andthecachecontentneedstoberefreshedbecauseitstillcontainsaninvalidentry.So,theauto-warmingprocesssimplyrepeatseachfilterqueryinthecache(genre:Jazz,genre:Fusionandreleased:1986inthiscase)andrefreshesthecontentwithvalidqueryresults.Aftertheauto-warming,thefiltercachewillhavethefollowingcontent:

Cacheentries(filterqueries) Queryresults(Documentidentifiers)

www.it-ebooks.info

Page 220: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

genre:Jazz 1,2,3

genre:Fusion 4,5,7

released:1986 8

Thisre-executionisingeneralthecostofauto-warming,whichisdirectlyconnectedwiththecachesize(ahugecacheinmostcaseswilltakesometimetore-executeallcachedqueries).

QueryResultcacheWiththiskindofcache,eachtimeaqueryisexecuted,itsresults(intermsofmatchingdocumentidentifiers)arecachedforfuturereuse.Thisisconfiguredinthefollowingfragmentofthesolrconfig.xmlfile:

<queryResultCacheclass="solr.FastLRUCache"size="512"initialSize="512"

autowarmCount="0"/>

Theunderlyingreasonisthatpopularqueries(thatis,queriesthatareoftenrepeated)willgainaclearadvantageherebecausetheywon’tbeactuallyexecutedagain—theirresultsarealreadycomputed.

NoteOtherthanpopularqueries,paginationmechanismsalsobenefitfromthiscache.Whentheuserasksforthenextorthepreviouspageofresultsforagivenqueryexecution,Solrwillrepeatthequerybutwithadifferentstartparameter.

DocumentcacheBothFilterCacheandQueryResultCachestoredocumentidentifiers.So,ontopofagivenquery,Solrcomputesthematchingidentifiers;foreachofthem,itneedstoquerytheindextoretrieveitsstoredfields.Afterthat,theresponseispopulatedwiththosedocumentsandtheircorresponding(stored)fields.

DocumentCachecachesLucenedocuments,soonceaqueryhasbeenexecuted,Solrdoesn’tneed(withregardtodocumentsthatarefoundinthiscache)toquerytheindextopopulatethelistofresults.

TipIfyouhavehugestoredfields(forexample,full-textfieldsusedforhighlighting),beawarethatyoucannotspecifywhichfieldsmustbeinthecache.Therefore,hugefieldsmayrequirealotofmemory.

FieldvaluecacheThefieldvaluecachehasamapstructurewherekeysarefieldnamesandvaluesareuninvertedfields.Thisstructuremapsdocumentidentifierswithvalues.Ifitisnotexplicitlydeclared,thiscacheisautomaticallygeneratedwithaninitialsizeof10,amaximumsizeof10000,andnoauto-warming.Itisprimarilyusedforfaceting.

www.it-ebooks.info

Page 221: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CustomcacheCustomcachesareintendedfordeveloperswhowritetheirownSolrextensions.Unliketheothertypes,customcachesacceptaregeneratorattribute,whichdeclaresaclassthatimplementstheauto-warminglogicforthecache.

www.it-ebooks.info

Page 222: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

QueryhandlersThepageaccessedbynavigatingtoPlugin/Stats|QueryHandlershowsanexpandablelistwhereeachitemisaqueryhandlerconfiguredinsolrconfig.xml.Thislistincludeshandlersthatrepresentsearchendpoints(thatis,SearchHandler)butalsootherhandlerssuchas/admin/ping,/admin/dump,and/debug.

TheconfiguredUpdateRequestHandlerinstances(forexample,/updateand/update/json),beingsubclassesofRequestHandler,arealsolistedinthispage.

Foreachhandler,theconsoleshowssomebasicattributessuchastheclassname,version,ashortdescription,andasetofstatisticaldata,aslistedinthefollowingtable:

Attribute Description

handlerStart Thedate(inmilliseconds)whenthehandlerreceiveditsfirstrequest.

Requests Thetotalnumberofrequestsreceived.

Errors Thenumberofrequeststhatraisedanexceptionduringtheexecution.

timeoutsIfthequeryisexecutedwiththetimeAllowedparameterandthegiventimeoutexpires,Solrwillreturnonlypartialresults.Thisattributecountstherequeststhatfacethisscenario.

totalTime Thetotal(requests)executiontime.

avgRequestsPerSecond Theaveragenumberofrequestspersecond.

5minRateReqsPerSecond

15minRateReqsPerSecond

Theaveragenumberofrequestspersecondoverthelastfiveandfifteenminutes,respectively.

avgTimePerRequest Theaverage(request)executiontime.

75thPcRequestTime

95thPcRequestTime

99thPcRequestTime

999thPcRequestTime

Startingfromthedistributionofthetotalrequestexecutiontimes,theseattributesreportthevalueatthe75th,95th,99th,and999thpercentileinthatdistribution,respectively.

So,especiallyforsearchendpoints,thispageisveryusefultounderstandandmonitortheusageandthestatisticalbehaviorofyourSolrinstance.

www.it-ebooks.info

Page 223: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UpdatehandlersUnderthesamepath(Plugin|Stats),theUpdateHandlerisapagecontaininganentrycorrespondingtotheorg.apache.solr.update.DirectUpdateHandler2instance.

Thefollowingtablelistsanddescribestheattributesofthathandler:

Attribute Description

commits Thetotalnumberofcommitrequestsreceived.

autocommitmaxTimeThemaximumamountoftimethatisallowedtopasssinceadocumentwasaddedbeforeautomaticallytriggeringanewcommit.

autocommits Thetotalnumberofhardauto-commitsexecuted.

softautocommits Thetotalnumberofsoftauto-commitsexecuted.

optimizes Thetotalnumberofoptimizerequestsreceived.

rollbacks Thetotalnumberofrollbackrequestsreceived.

expungeDeletes ThetotalnumberofhardcommitswiththeexpungeDeletesflagsettotrue.

docsPending Thetotalnumberofupdatesthathavebeenprocessedbutnotcommitted.

adds Thetotalnumberofaddsrequestsreceived.

deletesById ThetotalnumberofdeleteByIdrequestsreceived.

deletesByQuery ThetotalnumberofdeleteByQueryrequestsreceived.

errors Thetotalnumberoffailedoperations(forexample,updates,commits,androllbacks).

cumulative_adds

cumulative_deletesById

cumulative_deletesByQuery

cumulative_errors

UpdateHandlerhasalifecycleassociatedwithowningSolrCore.Inotherwords,whenSolrCoreisreloaded,anewinstanceofUpdateHandleriscreated.Themonitoringattributesprefixedwithcumulativeareacumulativemeasureofaspecificattribute(forexample,additionsanddeletions)sincetheSolrstartup.

MostSolrinstallationsI’vedoneinlibrariesupdatetheindexonadailybasis.Eachmorning,theUpdateHandlerstatspageshowsaperfectsummaryofwhathappenedduringthepreviousdayandcumulativelysincethelaststartup.Clearly,intheeventoferrors,logfilesserveasmyfriends.

Ontheotherhand,ifIneedtomonitortheoverallprogressofanindexupdateinrealtime,thenIprefertheJMXway,whichisdescribedinthenextsection.

www.it-ebooks.info

Page 224: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 225: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

JMXJavaManagementExtensions(JMX)areapowerfulsetofAPIsusedtomonitorandmanagearunningJVM.ThebuildingblocksofJMXaretheso-calledManagementBeans(MBeans),whicharebasicallywrappersthatdecorateexistingobjectswithamanagementinterface.ThecoreclassesofJVMaredecoratedwithMBeans.

TipMoreinformationaboutJMXcanbefoundathttp://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html.

MBeansareregisteredwithanMBeanServerthatexposesthosemanagementinterfacestoexternalclients.Applicationsarefreetocreate,register,andexposethemanagementinterfaceoftheirownspecificservices.SolrMBeansarenotautomaticallyregisteredwiththeMBeanServer,butifyouwanttodothat,justwrite(oruncomment)thefollowinglineinsolrconfig.xml:

<jmx/>

TheJVMcomeswithtwobuilt-inJMXclientscalledJConsoleandJVisualVM.

TipJVisualVMandJConsoleareverysimilartools.Here,wewilltalkonlyabouttheJConsolebecauseJVisualVMdoesn’thavetheMBeansperspective.

OpenashellinyourPCandtypethefollowingcommand:

#$JAVA_HOME/bin/jconsole

Adialogpop-upwillappear.ThisisthefirstscreenofJConsole,whichisaJavastandaloneapplication.ThedialogcontainsalistoflocallyrunningJVMs.OneofthemshouldbetheonewhereSolrisrunning.Selectthatentry,andyoushouldseeascreenwithseveraltabs:Overview,Memory,Threads,Classes,VMSummary,andMBeans.Atthemoment,weareinterestedinthelasttab,MBeans.Hereyoucansee(thetreecomponentontheleftside)allregisteredMBeans,asdepictedinthefollowingscreenshot:

www.it-ebooks.info

Page 226: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ForeachMBeaninthetree,youcanseeitsmanagementinterfaceintherightpane.Amanagementinterfaceiscomposedofattributesandoperations.

Operationscanbeinvokedandattributescanbemonitoredbylookingattheirvalueatagivenmomentorforagiveninterval.Todothis,youhavetodouble-clickonthemandactivateareal-timechart.

ThemaindifferencesbetweentheSolrAdministrationConsoleandJConsoleareasfollows:

TheSolrAdministrationConsole,beingawebapplication,offersstaticsnapshotsofthesystem.WithJConsole,it’spossibletoactivatereal-timemonitoringofoneormoreattributes.ThisisnotlimitedtoMBeanattributes.Intheothertabs,youcanmonitorthreads,processors,memory,andgarbagecollection.JConsolehasafinerlevelofgranularitythantheAdministrationConsole.There,wecanseeallattributesandoperationsexposedformanagement.JConsole,beingmoretechnical,islessusablethantheAdministrationConsole.

Clearly,JConsole,JVisualVM,andtheSolrAdministrationConsolearenotalternatives.

www.it-ebooks.info

Page 227: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Theyshouldbeusedtogetherinordertogetadifferentperspectiveonthesystem.

www.it-ebooks.info

Page 228: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 229: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthischapter,wedescribedsomeconceptsaboutSolradministrationandmonitoring.WeintroducedafewsystemadministrationtoolssuchastheSolrAdministrationConsoleandJConsole,andwecoveredhardwareresources.

Rememberthat,althoughthetopicscoveredinthischaptershouldberelevantforanadministratornowadays,thisroleisspreadamongseveralpeople(especiallyinsmallandmediumcompanies)whoaremostlydevelopers(adeveloperinasmallormediumcompanyisalikea“factotum”).Thisisthereasonitisimportantfornon-administratorstohaveataleastbasicunderstandingofadministration,management,andmonitoring.

Inthenextchapter,youwillseehowSolrcanbedeployedinthecontextofdevelopment,testing,andproduction.Wewillillustrateanddescribeseveraldeploymentscenarios,startingfromthesimplest,standaloneinstance,continuingwithagraduallygrowinglevelofcomplexity,andendingwithSolrCloud.SolrCloudisahighlyavailable,fault-tolerantclusterofSolrserversthatprovidedistributedindexingandsearchcapabilities.

www.it-ebooks.info

Page 230: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 231: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter6.DeploymentScenariosThischaptercontainsinformationonthevariouswaysinwhichyoucandeploySolr,includingkeyfeaturesandprosandconsforeachscenario.

Solrhasawiderangeofdeploymentalternatives,frommonolithictodistributedindexesandstandalonetoclusteredinstances.Wewillorganizethischapterbydeploymentscenarios,withagrowinglevelofcomplexity.

Thischapterwillcoverthefollowingtopics:

ShardingReplication:master,slave,andrepeatersSolrCloud

www.it-ebooks.info

Page 232: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

StandaloneinstanceAlltheexampleswefoundinthepreviouschaptersuseastandaloneinstanceofSolr,thatis,oneormorecoresmanagedbyaSolrdeploymenthostedinastandaloneservletcontainer(forexample,Jetty,Tomcat,andsoon).

Thiskindofdeploymentisusefulfordevelopmentbecause,asyoulearned,itisveryeasytostartanddebug.Besides,itcanalsobesuitableforaproductioncontextifyoudon’thavestrictnon-functionalrequirementsandhaveasmallormediumamountofdata.

TipIhaveusedastandaloneinstancetoprovideautocompleteservicesforsmallandmediumintranetsystems.

Anyway,themainfeaturesofthiskindofdeploymentaresimplicityandmaintainability;onesimplenodeactsasbothanindexerandasearcher.Thefollowingdiagramdepictsastandaloneinstancewithtwocores:

www.it-ebooks.info

Page 233: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 234: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ShardsWhenamonolithicindexbecomestoolargeforasinglenodeorwhenadditions,deletions,orqueriestaketoolongtoexecute,theindexcanbesplitintomultiplepiecescalledshards.

NoteTheprevioussentencehighlightsalogicalandtheoreticalevolutionpathofaSolrindex.However,this(ingeneral)isvalidforallscenarioswewilldescribe.Itisstronglyrecommendedthatyouperformapreliminaryanalysisofyourdataandtheestimatedgrowthfactorinordertodecidefromthebeginningtherightconfigurationthatsuitsyourrequirements.Althoughitispossibletosplitanexistingindexintoshards(https://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/index/PKIndexSplitter.htmlthingsdefinitelybecomeeasierifyoustartdirectlywithadistributedindex(ifyouneedit,ofcourse).

Theindexissplitverticallysothateachshardcontainsadisjointsetoftheentireindex.Solrwillqueryandmergeresultsacrossthoseshards.ThefollowingdiagramillustratesaSolrdeploymentwith3nodes;thisdeploymentconsistsoftwocores(C1andC2)dividedintothreeshards(S1,S2,andS3):

Whenusingshards,onlyqueryrequestsaredistributed.Thismeansthatit’suptotheindexertoaddanddistributethedataacrossnodes,andtosubsequentlyforwardachange

www.it-ebooks.info

Page 235: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

request(thatis,delete,replace,andcommit)foragivendocumenttotheappropriateshard(theshardthatownsthedocument).

TipTheSolrWikirecommendsasimple,hash-basedalgorithmtodeterminetheshardwhereagivendocumentshouldbeindexed:

documentId.hashCode()%numServers

Usingthisapproachisalsousefulinordertoknowinadvancewheretosenddeleteorupdaterequestsforagivendocument.

Ontheoppositeside,asearcherclientwillsendaqueryrequesttoanynode,butithastospecifyanadditionalshardsparameterthatdeclaresthetargetshardsthatwillbequeried.Inthefollowingexample,assumingthattwoshardsarehostedintwoserverslisteningtoports8080and8081,thesamerequestwhensenttobothnodeswillproducethesameresult:

http://localhost:8080/solr/c1/query?

q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2

http://localhost:8081/solr/c2/query?

q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2

Whensendingaqueryrequest,aclientcanoptionallyincludeapseudofieldassociatedwiththe[shard]transformer.Inthiscase,asapartofeachreturneddocument,therewillbeadditionalinformationindicatingtheowningshard.Thisisanexampleofsucharequest:

http://localhost:8080/solr/c1/query?

q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2&src_shard:

[shard]

Hereisthecorrespondingresponse(notethepseudofieldaliasedassrc_shard):

<resultname="response"numFound="192"start="0">

<doc>

<strname="id">9920</str>

<strname="brand">Fender</str>

<strname="model">JazzBass</str>

<arrname="artist">

<str>MarcusMiller</str>

</arr><strname="series">MarcusMillersignature</str>

<strname="src_shard">localhost:8080/solr/shard1</str>

</doc>

<doc>

<strname="id">4392</str>

<strname="brand">MusicMan</str>

<strname="model">StingRay</str>

<arrname="artist"><str>TonyLevin</str></arr>

<strname="series">5stringsDeLuxe</str>

<strname="src_shard">localhost:8081/solr/shard2</str>

</doc>

www.it-ebooks.info

Page 236: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

</result>

Thefollowingareafewthingstokeepinmindwhenusingthisdeploymentscenario:

TheschemamusthaveauniqueKeyfield.Thisfieldmustbedeclaredasstoredandindexed;inaddition,itissupposedtobeuniqueacrossallshards.InverseDocumentFrequency(IDF)calculationscannotbedistributed.IDFiscomputedpershard.Joinsbetweendocumentsbelongingtodifferentshardsarenotsupported.Ifashardreceivesbothindexandqueryrequests,theindexmaychangeduringaqueryexecution,thuscompromisingtheoutgoingresults(forexample,amatchingdocumentthathasbeendeleted).

www.it-ebooks.info

Page 237: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 238: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Master/slavesscenarioInamaster/slavesscenario,therearetwotypesofSolrservers:anindexer(themaster)andoneormoresearchers(theslaves).

Themasteristheserverthatmanagestheindex.Itreceivesupdaterequestsandappliesthosechanges.Asearcher,ontheotherhand,isaSolrserverthatexposessearchservicestoexternalclients.

Theindex,intermsofdatafiles,isreplicatedfromtheindexertothesearcherthroughHTTPbymeansofabuilt-inRequestHandlerthatmustbeconfiguredonboththeindexersideandsearcherside(withinthesolrconfig.xmlconfigurationfile).

Ontheindexer(master),areplicationconfigurationlookslikethis:

<requestHandler

name="/replication"

class="solr.ReplicationHandler">

<lstname="master">

<strname="replicateAfter">startup</str>

<strname="replicateAfter">optimize</str>

<strname="confFiles">schema.xml,stopwords.txt</str>

</lst>

</requestHandler>

Thereplicationmechanismcanbeconfiguredtobetriggeredafteroneofthefollowingevents:

Commit:AcommithasbeenappliedOptimize:TheindexhasbeenoptimizedStartup:TheSolrinstancehasstarted

Intheprecedingexample,wewanttheindextobereplicatedafterstartupandoptimizecommands.UsingtheconfFilesparameter,wecanalsoindicateasetofconfigurationfiles(schema.xmlandstopwords.txt,intheexample)thatmustbereplicatedtogetherwiththeindex.

NoteRememberthatchangesonthosefilesdon’ttriggeranyreplication.Onlyachangeintheindex,inconjunctionwithoneoftheeventswedefinedinthereplicateAfterparameter,willmarktheindex(andtheconfigurationfiles)asreplicable.

Onthesearcherside,theconfigurationlookslikethefollowing:

<requestHandler

name="/replication"

class="solr.ReplicationHandler">

<lstname="slave">

<strname="masterUrl">http://<localhost>:<port>/solrmaster</str>

<strname="pollInterval">00:00:10</str>

</lst>

</requestHandler>

www.it-ebooks.info

Page 239: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Youcanseethatasearcherperiodicallykeepspollingthemaster(thepollIntervalparameter)tocheckwhetheranewerversionoftheindexisavailable.Ifitis,thesearcherwillstartthereplicationmechanismbyissuingarequesttothemaster,whichiscompletelyunawareofthesearchers.

Thereplicabilitystatusoftheindexisactuallyindicatedbyaversionnumber.Ifthesearcherhasthesameversionasthemaster,itmeanstheindexisthesame.Iftheversionsaredifferent,itmeansthatanewerversionoftheindexisavailableonthemaster,andreplicationcanstart.

Otherthanseparatingresponsibilities,thisdeploymentconfigurationallowsustohaveaso-calleddiamondarchitecture,consistingofoneindexerandseveralsearchers.Whenthereplicationistriggered,eachsearcherintheringwillreceiveawholecopyoftheindex.Thisallowsthefollowing:

Loadbalancingoftheincoming(query)requests.Anincrementtotheavailabilityofthewholesystem.Intheeventofaservercrash,theothersearcherswillcontinuetoservetheincomingrequests.

Thefollowingdiagramillustratesamaster/slavedeploymentscenariowithoneindexer,threesearchers,andtwocores:

www.it-ebooks.info

Page 240: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ifthesearchersareinseveralgeographicallydislocateddatacenters,anadditionalrolecalledrepeatercanbeconfiguredineachdatacenterinordertorationalizethereplicationdatatrafficflowbetweennodes.Arepeaterissimplyanodethatactsasbothamasterandaslave.Itisaslaveofthemainmaster,andatthesametime,itactsasmasterofthesearcherswithinthesamedatacenter,asshowninthisdiagram:

www.it-ebooks.info

Page 241: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 242: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ShardswithreplicationThisscenariocombinesshardsandreplicationinordertohaveascalablesystemwithhighthroughputandavailability.Thereisoneindexerandoneormoresearchersforeachshard,allowingloadbalancingbetween(query)shardrequests.Thefollowingdiagramillustratesascenariowithtwocores,threeshards,oneindexer,and(duetoproblemswithavailablespace),onlyonesearcherforeachshard:

Thedrawbackofthisapproachisundoubtedlytheoverallgrowingcomplexityofthesystemthatrequiresmoreeffortintermsofmaintainability,manageability,andsystemadministration.Inadditiontothis,eachsearcherisanindependentnode,andwedon’thaveacentraladministrationconsolewhereasystemadministratorcangetaquickoverviewofsystemhealth.

ThesedisadvantageshavebeeneithermitigatedorovercomeinSolrCloud,whichisdescribedinthenextsection.

www.it-ebooks.info

Page 243: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 244: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SolrCloudSolrCloudisahighlyavailable,fault-tolerantclusterofSolrserversthatprovidesdistributedindexingandsearchcapabilities.ThefollowingdiagramillustratesasimpleSolrCloudscenario:

AlthoughSolrCloudintroducedanewterminologytodefinethingsinadistributeddomain,theprecedingdiagramhasbeendrawnwiththesameconceptsthatwesawinthepreviousscenarios,forbetterunderstanding.

TipStartingfromSolr4.10.0,thedownloadbundlecontainsaninteractive,wizard-likecommand-linesetupforasampleSolrCloudinstallation.Astep-by-stepguideforthisisavailableathttps://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud.

ThefollowingsectionswilldescribetherelevantaspectsofSolrCloud.

www.it-ebooks.info

Page 245: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ClustermanagementApacheZookeeperwasintroducedinSolrCloudforclustercoordinationandconfiguration.Thismeansitisacentralactorinthisscenario,providingdiscovery,configuration,andlookupservicesforothercomponents(includingclients)togatherinformationabouttheSolrcluster.

ApacheZookeeper,beingacentralcomponent,canbeorganizedinaclusteritself(asdepictedinthepreviousdiagram)inordertoavoidasinglepointoffailure.AclusterofZookeepernodesiscalledensemble.

TipFormoreinformationaboutApacheZookeeper,visithttp://zookeeper.apache.org,theprojecthomepage.

www.it-ebooks.info

Page 246: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Replicationfactor,leaders,andreplicasIntheprecedingdiagram,wehaveonlyonecore(C1)withthreeshards(S1,S2,andS3).Now,themaindifferencebetweenthepreviousdistributedscenario(wherewemetshards)andthisscenarioisthathere,there’sacopyofeachshardineverynode.Thatcopyiscalledareplica.Inthisexample,wehavethreecopiesforeachshard,butthisisjustforsimplicity;youcanhaveasmanycopiesasyouwant.

Morespecifically,SolrCloudhasapropertycalledreplicationfactor,thatdeterminesthetotalnumberofcopiesintheclusterforeachshard.Amongthecopies,oneiselectedastheleader(theletter“L”onC1/S1onthefirstnode)whiletheremainingarereplicas(theletter“R”).

TipIntheprecedingdiagram,thereplicationfactoris3anditisequaltothenumberofnodes.Keepinmindthatthisisacoincidence;thosemeasurescouldbedifferent,andtheyactuallydependonyourclusterconfigurationandneeds.

Thisreplicationfeaturesatisfiesthreeimportantnonfunctionalrequirements:loadbalancing,highavailability,andbackup.Wehavealreadydescribedhowtheclassicreplicationmechanismprovidesloadbalancing.Havingthesamedatawithinmorethanonenodeallowsasearchertoissuequeryrequeststothosenodesinaround-robinfashion,thusexpandingtheoverallcapacityofthesystemintermsofqueriespersecond.Here,thecontextisthesame;eachshard,regardlessofwhetheritisaleaderorareplica,canbefoundonnnodes(wherenisthereplicationfactor);therefore,aclientcanusethosenodesforloadbalancingrequests.

Highavailabilityisadirectconsequenceoftheredundancyintroducedwithshardreplication.Thepresenceofthesamedata(andthesamesearchservices)onseveralnodesmeansthat,evenifoneofthosenodecrashes,aclientcancontinuetosendrequeststotheremainingnodes.

Theredundancyintroducedwiththereplicationalsoworksasabackupmechanism.Havingthesamethingsinseveralplacesprovidesabetterguaranteeagainstdataloss.Afterall,thisistheunderlyingprincipleofthepopularclouddataservices(forexample,Dropbox,ICloud,andCopy).

www.it-ebooks.info

Page 247: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DurabilityandrecoveryEachnodemaintainsawrite-aheadtransactionlog,whereanychangeisrecordedbeforebeingappliedtotheindex.Therefore,thetransactionlogisavailableforleadersandreplicas,anditwillbeusedtodeterminewhichcontentneedstobepartofachosenreplicaduringsynchronization.Forinstance,whenanewreplicaiscreated,itreferstoitsleaderanditstransactionlogtoknowwhichcontenttoget.

Thetransactionlogwillalsobeusedwhenrestartingaserverthatdidn’tshutdowngracefully.Itscontentwillbe“replayed”inordertosynchronizelocalleadersandreplicas.

TipWrite-aheadloggingiswidelyusedindistributedsystems.Formoreinformationaboutit,seehttps://cwiki.apache.org/confluence/display/solr/NRT%2C+Replication%2C+and+Disaster+Recovery+with+SolrCloud

Thetransactionlogpathcanbeconfiguredinanappropriatesectionofthesolconfig.xmlfile.

www.it-ebooks.info

Page 248: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ThenewterminologyNowthatthemainfeaturesofSolrCloudhavebeenexplained,wecanstopthinkingaboutitasanevolutionoftheshardscenarioandcoveritsownterminology:

Parameter Description

Node ThisisaJavaVirtualMachinerunningSolr.

Cluster AsetofSolrnodesthatformasingleunitofservice.

Shard Wepreviouslydefinedashardasaverticalsubsetoftheindex,thatis,asubsetofalldocumentsintheindex.Ashardisasinglecopyofthatsubset.InSolrCloud,itcanbealeaderorareplica.

Partition/slice Asubsetofthewholeindexreplicatedononeormorenodes.Asliceisbasicallycomposedofallshards(leaderandreplicas)belongingtothesamesubset.

Leader Eachshardhasonenodeidentifiedasitsleader.Thisroleiscrucialfortheupdateworkflow.Alltheupdatesbelongingtoapartitionroutethroughtheleader.

ReplicaThereplicationfactordeterminesthetotalnumberofcopieseachshardhas.Amongallofthosecopies,oneiselectedastheleader,whiletheothersarecalledreplicas.Whilequeryingcanbedoneacrossallshards,updatesarealwaysdirected(orforwardedbyreplicas)toleaders.

Replicationfactor Thenumberofcopiesofashard(andhence,ofadocument)maintainedbythecluster.

Collection Acorethatislogicallyandphysicallydistributedacrossthecluster.Inourexample,wehaveonlyonecollection(C1).

www.it-ebooks.info

Page 249: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

AdministrationconsoleInaSolrClouddeployment,theadministrationconsoleofeachnodewillreportanadditionalmenuitemcalledCloud,whereit’spossibletogetanoverallviewofthecluster.Youcanchoosebetweenseveralgraphicrepresentationsofthecluster(tree,graph,andradial),butallofthemhaveacommonaim—givinganimmediateoverviewoftheclusterintermsofnodes,shards,andcollections.ThisisascreenshotfromtheadministrationconsoleoftheSolrCloudusedinthissection:

www.it-ebooks.info

Page 250: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CollectionsAPITheCollectionsAPIisusedtomanagethecluster,includingcollections,shards,andmetadataaboutthecluster.ThisinterfaceiscomposedofasingleHTTPserviceendpointlocatedathttp://<hostname>:<port>/<contextroot>/admin/collections.

TheCollectionsAPIacceptsanactionparameter,whichisamnemoniccodeassociatedwiththecommandthatwewanttoexecute.Eachcommandhasitsownsetofparametersthatdependonthegoalofthecommand.Thefollowingtableliststheallowedvaluesfortheactionparameter(thatis,theavailablecommands):

Action Description

CREATE Createsanewcollection.

RELOAD Reloadsacollection.ThisisusedwhenaconfigurationhasbeenchangedinZooKeeper.

DELETE Deletesacollection.

LIST Returnsthenamesofthecollectionsinthecluster.

CREATESHARD Createsanewshard.

SPLITSHARD Splitsanexistingshardintotwonewshards.

DELETESHARD Deletesaninactiveshard.

CREATEALIAS Createsorreplacesanaliasforanexistingcollection.

DELETEALIAS Deletesanalias.

ADDREPLICA Addsanewreplicaforagivenshard.

DELETEREPLICA Deletesareplicaofashard.

CLUSTERPROP Adds,edits,ordeletesaclusterproperty.

MIGRATE Movesdocumentsbetweencollections.

ADDROLEAddsaroletoanode.Atthetimeofwritingthisbook,theonlysupportedroleisanoverseer.Thisistheclusterleaderresponsibleforshardassignmentsandnodemanagementoperations.

REMOVEROLE Removesarolefromanode.

OVERSEERSTATUS Returnsthecurrentstatusoftheoverseer,includingsomestatsaboutservicescalls(forexample,createcollectionandcreateshard).

CLUSTERSTATUS Returnstheclusterstatus,includingshards,collections,replicas,aliases,andclusterproperties.

REQUESTSTATUS Returnsthestatusofthoserequeststhathavebeenexecutedasynchronously(for

www.it-ebooks.info

Page 251: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

example,MIGRATE,SPLITSHARD,andCREATECOLLECTION).

ADDREPLICAPROP Addsorreplacesareplicaproperty.

DELETEREPLICAPROP Deletesareplicaproperty.

BALANCESHARDUNIQUE Distributesagivenpropertyevenlyamongthephysicalnodesthatmakeupacollection.

Thecompletelistofparametersforeachcommandisavailableathttps://cwiki.apache.org/confluence/display/solr/Collections+API.

www.it-ebooks.info

Page 252: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DistributedsearchQueriescanbesenttoanynodeperformingafulldistributedsearchacrosstheclusterwithloadbalancingandfailover.SolrCloudalsoallowspartialqueries,thatis,queriesexecutedagainstagroupofshards,alistofservers,oralistofcollections.

TipIfyouareusingJavaonclienttheside,CloudSolrServerinSolrjcompletelysimplifiescommunicationbetweentheclient,Zookeeper,andthecluster.Asadeveloper,youwillworkwiththeusualSolrServerinterface.

www.it-ebooks.info

Page 253: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Cluster-awareindexAdrawbackofthefirstdistributedscenariowemet(thatis,shards)wasthataclientthatwantstoissueanupdaterequestneedstoexplicitlypointtothetargetshard.ThisisnolongervalidinaSolrCloudcontextbecause,foragivenshard,therecouldbemorethanonecopy(thatis,aleaderandzeroormorecopies).Sotheupdatepathbecomesthefollowing:

UpdatescanbesenttoanynodeintheclusterIfthetargetnodeistheleaderoftheshardowningthedocument,theupdateisexecutedthere,andthenitisforwardedtoallreplicasIfthetargetnodeisareplica,thentheupdaterequestisforwardedtoitsleader,andtheflowdescribedinthepreviouspointapplies

TipTheCloudSolrServerinSolrjasksZookeeperabouttheleader’slocationbeforesendingupdates.Thus,requestsarealwaystargetedatleaders,avoidingadditionalnetworkround-trips.

www.it-ebooks.info

Page 254: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 255: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthischapter,wedescribedvariouswaysinwhichyoucandeploySolr.Eachdeploymentscenariohasspecificfeatures,advantages,anddrawbacksthatmakeachoiceidealforonecontextandbadforanother.Agoodthingisthatthedifferentscenariosarenotstrictlyexclusive;theyfollowanincrementalapproach.Inanidealcontext,thingsshouldstartimmediatelywiththeperfectscenariothatfitsyourneeds.However,unlessyourrequirementsareclearrightfromthestart,youcanbeginwithasimpleconfigurationandthenchangeit,dependingonhowyourapplicationevolves.

Inthenextchapter,wewillwalkthroughsomeusefuladd-onsthatarenotpartofthecoredistributionbutareincludedintheSolrdownloadbundle.

www.it-ebooks.info

Page 256: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 257: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter7.SolrExtensionsEverypopularopensourceprojectusuallyincludesacontribfoldercontainingseveralextramodulestosolvecommonusecaseimplementationproblems.InSolr,youcanfindsuchmoduleswithinthedownloadbundle,asdepictedinthefollowingscreenshot:

Supposeyourdataisinarelationaldatabase,anXMLfilewithacustomformat,oramailserver;youneedtoindexdatacomingfromaContentManagementSystem(suchasDrupal,Joomla!,orWordPress);oryouhaverichdocuments(suchasPDFsorMicrosoftOfficedocuments)andyouwanttodosomekindofautomatickeywordextraction.Ingeneral,theserequirementsarenotcoveredbythecorepartofSolr.Youwillhavetopluginandconfigurethosecontributionmodules.

Theaimofthischapteristodescribesuchmodules.Inordertodothat,wewillmakeuseofapreloadedsampleSolrinstance,withthoseextensions.Tostartthisinstance,youhavetocheckoutthesourceprojectassociatedwiththechapter,changethedirectorytothech7folder,andtypethisfromthecommandline:

#mvncleanpackagecargo:run

IfyoucheckedouttheprojectusingEclipse,youmighthavenoticedthat,underthesrc/dev/eclipsefolder,thereispreconfiguredlauncher.Right-clickonitandchoosetheDebugas…menuitem.

Regardlessofthewayyouchoose,youwillseesomethinglikethisattheend:

[INFO]Jetty8.1.15.v20140411Embeddedstartedonport[8983]

[INFO]PressCtrl-Ctostopthecontainer…

Thismeansthatthesampleinstanceisupandrunning.Thischapterwillcoverthe

www.it-ebooks.info

Page 258: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

followingpoints:

ImportingdatafromseveraldatasourcesTextandmetadataextractionfromdigitaldocumentsLanguageidentificationSolritas(thatis,SolrandVelocity)Othercontribmodules

www.it-ebooks.info

Page 259: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DataImportHandlerTheDataImportHandlerisamodulethatenablesSolrtoloaddatafromseveraltypesofdatasources.Themostfrequenttypeofstoragewhereapplicationsputtheirdataisundoubtedlyarelationaldatabase,butingeneral,wecouldhavealotofscenarioshere:filesystems,websites,emails,FTPservers,LDAP,NoSQLdatabases,andsoon.

TheDataImportHandlermodule,otherthanprovidingalotofready-to-useconnectors,isanextensibleframeworkwheredevelopersarefreetoinjecttheirstorage-specificconnectorlogic.Theconfigurationhappensintwodifferentplaces:thefirstisthesolrconfig.xmlfile(asusual),wherethehandlerisdeclaredasfollows:

<requestHandlername="/import"

class="org.apache.solr.handler.dataimport.DataImportHandler">

<lstname="defaults">

<strname="config">dih-config.xml</str>

</lst>

</requestHandler>

Thesecondisthehandlerconfigurationfile(intheprecedingexample,wecalleditdih-config.xml).Althoughthespecificcontentofthatfilecouldvary,mainlydependingonthekindofdatasourceweareusing,thebuildingblocksofaDataImportHandlerdomainaredatasources,documents,entities,fields,transformers,andprocessors.

www.it-ebooks.info

Page 260: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DatasourcesAdatasourceisacollectionofrecordsthatstoredata.Althoughyouareprobablythinkingofrelationaldatabases,datasourcescanalsobeassociatedwithotherkindsofsourcesandprotocols,suchaswebsites(HTTP),FTPservers,LDAP,mailservers,andsoon.

AdatasourcedeclarationisprobablythefirstthingyouwillmeetinaDataImportHandlerconfigurationfile.Firstofall,youmustdeclarewhereyourdatais:

<dataSource

type="JdbcDataSource"

driver="com.mysql.jdbc.Driver"url="jdbc:mysql://host/database-name"

user="database_username"

password="database_password"/>

<dataSource

type="FileDataSource"encoding="UTF-8"/>

Notethatit’spossibletodeclaremorethanonedatasource(forexample,adatabaseandafilesystemortwodifferentdatabases).Eachdatasourcehasitsownspecificpropertiesthatdependonitsnature.Thefollowingtabledescribestheavailabledatasources:

Name Description

JdbcDataSource

Thisconnectstoadatabase(adirectconnectionorJNDIdatasource)usingaJDBCdriver.NotethatSolrdoesn’tcomewithanyJDBCdrivershipped.Youmustobtainitseparatelyandputthatlibraryundertheserverclasspathorunderthecorelibfolder.

URLDataSource ReadscharacterfilesusingHTTP.

BinURLDataSource ReadsbinaryfilesusingHTTP.

FileDataSource Readsfromlocalcharacterfiles.

BinFileDataSource Readsfromlocalbinaryfiles.

ContentStreamDataSource ReadsfromtheContentStreamofaPOSTrequestusingjava.io.Reader.

BinContentStreamDataSource ReadsfromtheContentStreamofaPOSTrequestusingjava.io.InputStream.

FieldReaderDataSource Usedinconjunctionwithotherdatasources,whenagivenfieldcontainstextthatneedsfurtherprocessing(forexample,whenitcontainsanXMLdocument).

FieldStreamDataSourceUsedinconjunctionwithotherdatasourceswhenagivenfieldcontainsbinarycontentthatneedsfurtherprocessing(forexample,whenitcontainsthevalueofaBLOBdatabasecolumn).

www.it-ebooks.info

Page 261: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Documents,entities,andfieldsMappingbetweenexternaldataandSolrisdoneusingdocuments,entities,andfields.

Adocumentrepresentsalogicaltype(suchasproducts,books,andassociations).Itcontainsoneormoreentities.

Entitiesarecalledrootorsubentitiesdependingontheirnestinglevel.Root-entitiesaredirectchildrenofadocument.Sub-entitiesarechildrenofanotherentity.Theyhavearelationshipwiththeirparents;withintheirconfiguration,it’spossibletouseanexpressionlanguagetorefertotheirparents.

FieldsareconcreteplaceswherethemappingbetweentheexternaldatasourceandSolrdocumentoccurs.Thefollowingfigureschematizestheserelationships:

Asingledocumentcanhaveoneormorerootentities.Eachentitydefinesthelogictogatheritsdataandpopulateitsfields.

Inthefollowingexample,aSolrschemacontainsbooks.Eachbookconsistsofanidentifier(id),atitle(title),andoneormoreauthors.Therearetwodatabasetables,BOOKSandAUTHORS,witha1:nrelationship(thismeansthatabookcanhavemorethanoneauthor).

First,let’sseehowtherootentity(thebook)isdefined:

<documentname="books">

<entityname="book"dataSource="my-ds"

query="SELECTBOOK_ID,TITLEFROMBOOKS"onError="skip">

<fieldcolumn="BOOK_ID"name="id"/>

<fieldcolumn="TITLE"name="title"/>

Asyoucansee,theentityisassociatedwithadatasourcecalledmy-ds.Itisconfiguredwithaquery,andforeachrecordoftheoutcomingResultSet,weareinterestedintwofields:BOOK_IDandTITLE.TheyaremappedwiththeidandtitlefieldsintheSolrschema.

TipIfthenameofthecolumn(orthealias)inResultSetcoincideswiththenameoftheSolrfield(caseinsensitive),the<field>declarationcanbeomitted.Solrwillperformthe

www.it-ebooks.info

Page 262: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

mappingautomatically.So,intheprecedingexample,theTITLEmappingcanberemoved.

Now,sincethecardinalityoftherelationshipbetweenbooksandauthorsis1:n,weneedtodefineasub-entity.Foreachbook,wemustquerythedatasourceagaintofindthecorrespondingauthors:

<entityname="book"dataSource="my-ds"query="SELECTBOOK_ID,TITLEFROM

BOOKS"onError="skip">

<fieldcolumn="BOOK_ID"name="id"/>

<fieldcolumn="TITLE"name="title"/>

<entityname="author"dataSource="my-ds"query="SELECTNAMEFROMAUTHORS

WHEREBOOK_ID=${book.BOOK_ID}">

<fieldcolumn="NAME"name="author"/>

Theauthorsub-entitydeclaresaqueryontheAUTHORStable.Itusesasimpleexpressionlanguagetorefertotheidentifierofthecurrent(parent)book:

${<parententityname>.<databasealiasorcolumnname>}

Obviously,thisisareallysimplifiedexample.Inarealproductionscenario,youwillprobablymeetcomplicatedrelationalschemas,buttheDataImportHandlerlogicwillbealwaysthesame—detectandconfigureentitiesorfieldsinordertodenormalizeyourdatamodel.

www.it-ebooks.info

Page 263: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

TransformersAtransformerisafunctionassociatedwithanentity(rootornested)thatcanmanipulatethefieldsfetchedbytheentityitself.Thetransformermustbedeclaredasanattributeofthetargetentity:

<entityname="author"transformer="script:createAuthorFullName">

Thecorrespondingfunctionwillbecalledforeachsetoffields(record)fetchedbythequeryassociatedwiththeentity.Thefunctionhascompletecontroloverthefetchedrecord.Itcanremove,add,orreplacefields.

Inthepreviousexample,theSolrschemaincludesanauthorfieldthatissupposedtoholdthecompletenameoftheauthor(forexample,DanteAlighieri).Nowlet’simaginethattheAUTHORStablecontainstwoseparatecolumnsinstead—FIRST_NAMEandLAST_NAME.Withthehelpofabuilt-inscripttransformer,wecanwriteasimpleJavaScriptfunctiontocombinethetwofields:

<script><![CDATA[

functioncreateAuthorFullName(record){

varfirst=record.remove('FIRST_NAME');

varlast=record.remove('LAST_NAME');

record.put('author',first+''+last);

returnrecord;

}

]]></script>

Notehowwemanipulatedthecurrentrecordbyaddinganewfield(author)andremovingtheLAST_NAMEandFIRST_NAMEfields.

Thefollowingtableliststheavailablebuilt-intransformers:

Name Description

ScriptTransformer ExecutesafunctionwritteninJavaScriptoranotherscriptinglanguagesupportedbyJava.

DateFormatTransformer Createsjava.util.Dateinstancesfromstringliterals.

HTMLStripTransformer StripsoffHTMLtagsfromfieldvalues.

LogTransformer Logsmessagesusingagiventemplate.

NumberFormatTransformer Createsnumberinstancesfromstringliterals.

RegexTransformer Usesregularexpressionstomanipulatedatainfields.

TemplateTransformer

Putsvaluesinacolumnbyresolvinganexpressioncontainingothercolumns.Forexample,theconcatenationwegotwiththeScriptTransformercanalsobedoneusingthistransformer:

<fieldname="author"template="${author.FIRSTNAME}${author.LAST_NAME}"

www.it-ebooks.info

Page 264: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Atransformerissimplyaclassthatextendsorg.apache.solr.handler.dataimport.Transformerso,ifthebuilt-inportfoliodoesn’tmeetyourneeds,itisalwayspossibletocreateacustomimplementation.

www.it-ebooks.info

Page 265: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

EntityprocessorsEachentityishandledbyaso-calledEntityProcessorthatdefaultstoSQLEntityProcessor.Thisisbecausetherelationaldatabaseisthemostpopulartypeofdatasource.

However,whenusingadifferentdatasourcesuchasHTTP,filesorstreams,theentitymanagementlogicshouldhaveitsownspecificrequirementsthatmostprobablyfalloutsidetheareacoveredbySQLEntityProcessor.Inthesecases,youcanoverridethedefaultsettingsbyexplicitlydeclaringanEntityProcessorforagivenentity.

Asusual,therearealotofbuilt-inEntityProcessorinstancesbutitisalwayspossibletocreateacustomimplementationbyextendingtheorg.apache.solr.handler.dataimport.Entityprocessorclass.

Thefollowingtablelistsanddescribesavailableentityprocessors:

Name Description

SqlEntityProcessor Thisisthedefaultentityprocessorassignedtoeachentity.Itprovidessupporttoreadandcachedatafromdatabases.ItisusedinconjunctionwithJdbcDataSource.

FileListEntityProcessor Enumeratesthelistoffilesfromafilesystembasedoncriteriaspecifiedintheassociatedentity(forexample,basepath,recursive,andfilenamepattern).

LineEntityProcessor Readsfromadatasourceonaline-by-linebasisandproducesafieldcalledrawLineforeachlineread.

MailEntityProcessor HandlesemailsandattachmentsfromPOP3orIMAPsources.

PlainTextEntityProcessor ReadsfromadatasourceandreturnsafieldcalledplainText.Thisfieldcontainsastringrepresentingthesourcecontent.

SolrEntityProcessor ReadsvaluesfromanotherSolrinstanceusingSolrj.EachreturnedrecordisaSolrDocumentinstance.

TikaEntityProcessor ExtractsmetadataandtextfromrichdocumentsbymeansofApacheTika.Later,wewillseetheContentExtractionLibrary,whichalsousesTikaastheextractionengine.

XPathEntityProcessor UsesastreamingXPATHparsertoextractvaluesfromXMLdocuments.

www.it-ebooks.info

Page 266: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

EventlistenersThedocumentelementintheDataImportHandlerconfigurationallowsustodeclaretwoeventlistenerstointerceptthemostrelevanteventsofadataimportlifecycle—onImportStartandonImportEnd:

<document

onImportStart="com.foo.MyImportStartEventListener"

onImportEnd="com.foo.MyImportEndEventListener">

Theeventlistenersmustimplementtheorg.apache.solr.handler.dataimport.EventListenerinterface,whichgivesthemaccess(bymeansofanorg.apache.solr.handler.dataimport.Contextinstance)tomostDataImportHandlerobjectsandeventstatisticssuchasdocumentsskipped,indexed,failed,andsoon.

www.it-ebooks.info

Page 267: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 268: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ContentExtractionLibraryTheContentExtractionLibrary(alsoknownasSolrCell)integratesthepopularApacheTikaframeworktodetectandextractmetadataandtextfromalargevarietyoffiletypessuchasPDF,MicrosoftOffice,LibreOffice,andOpenOfficedocuments.

ApacheTikaprovidesafaçadeparserinterfaceontopofseverallow-levelframeworksthatareabletomanageandmanipulatespecificfiletypes(forexample,PDFBoxforPDFsandApachePOIforMicrosoftdocuments).Itssimpleinterfacealsoprovidesautomaticmime-typedetection,sotheframeworkitselfisabletounderstandthecorrectparserthatneedstobeappliedforagivenfile.

OntheSolrside,adedicatedExtractingRequestHandlerwillbeinchargeofgettingtheinputdata(files)sentbyclientsandextractingmetadataandtextbymeansofTika.

TheconfigurationofExtractingRequestHandlerfollowsthesameprocedurethatwesawfortheotherhandlers.Specifically,ithastobedeclaredinsolrconfig.xml,asfollows:

<requestHandlername="/update/extract"

class="solr.extraction.ExtractingRequestHandler">

<lstname="defaults">

</lst>

</requestHandler>

SolrCellhasseveraloptionsthatcanbeconfiguredtofine-tuneitsbehavior.Mostofthemarerelatedtometadatahandling,fieldnamemapping,andcustomTikaconfiguration.

TipForacompletelistofallconfigurationparameters,gotohttps://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Thesrc/solr/solr-home/example-datafolderintheexampleprojectcontainsadocumentthatcanbesenttoSolrCell.Openashellandtypethefollowing(replacethePROJECT_HOMEplaceholderwithyourch7projectlocalpath):

#curl"http://localhost:8983/solr/example/update/extract?commit=true"-F

data=@PROJECT_HOME/ch7/src/solr/solr-home/example-data/libreoffice-

writer.odt

Waitforamoment,andthenyoushouldseearesponselikethis:

<response>

<lstname="responseHeader">

<intname="status">0</int>

<intname="QTime">572</int>

</lst>

</response>

Thedocument(theLibreOfficedocumentinthiscase,butyoucanalsotryotherfiles)hasbeenindexed.Youcanseethat,whenyouopenthebrowserandtypehttp://127.0.0.1:8983/solr/example/select?q=stream_name:libreoffice-

www.it-ebooks.info

Page 269: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

writer.odt&indent=true,theXMLresponseshowstheextractedtext(underthetextattribute)andallthemetadatafieldsthathavebeendetectedforthatdocument.

www.it-ebooks.info

Page 270: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 271: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

LanguageIdentifierThelanguageIdentifierextensiondetectsthelanguage(orlanguages)offieldsbelongingtoagivendocument.Thisisaveryusefuladd-ontouseinconjunctionwiththepreviouslydescribedextractionlibrary,togetadditionalinformationaboutdatathathasbeenindexed.

ThecomponentisimplementedasanUpdateRequestProcessorsubclassthatinterceptsandanalyzestheincomingdata:

<processor

class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcess

orFactory">

<strname="langid.fl">text</str>

<strname="langid.langField">language</str>

<strname="langid.fallback">en</str>

</processor>

Asyoucansee,thisprocessorcanbeconfiguredwithseveraloptions.Wecandeclarethefieldsoftheincomingdocumentsthatmustbeanalyzed,thenameofthefieldthatwillholdtheresultsoflanguagedetection,oradefaultfallbacklanguageincasenodetectionispossible.

TipIntheexampleprojectassociatedwiththischapter,youwillfindasolrconfig.xmlfilewherethechainisalreadydefinedbuttheUpdateRequestProcessoriscommentedout.Justremovethecommentmarkers,reloadthecoreusingtheAdministrationConsole,andreindexthedocumentsundertheexample-datafolder,followingthesameprocedureaswedescribedintheprevioussection.Attheend,youwillseeanadditional“language”fieldineachdocument;thatistheresultofthelanguagedetectioncomponent.

Youshouldknowthatdeclaringtheprocessorwithinthesolrconfig.xmlfileisnotenough.Weneedtoinsertthatintoanupdaterequestprocessorchain,andfinallyassociatethatchainwithanUpdateRequestHandler.Onlythoseupdaterequeststhatwillbereceivedbythathandlerwillpassthroughthelanguagedetectionanalysischain.

www.it-ebooks.info

Page 272: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 273: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

RapidprototypingwithSolaritasSolritasisthenameofacontributionmodulethatintegratesSolrwithApacheVelocity.ItisbasicallyaresponsewriterthatusestheApacheVelocitytemplateenginetorenderSolrresponseswithagraphicaluserinterface.

Asetofready-to-useVelocitytemplatesiscombinedwithSolrresponsesinordertoprovideasearchGUIwithalotoffeatures(forexample,faceting,highlighting,andautocompletion).

TipYoucanfindtheVelocitytemplatesunderthesrc/solr/solr-home/example/conf/velocityfolderofthech7project,orundertheexample/solr/collection1/conf/velocityfolderoftheSolrdownloadbundle.

AsthisGUIisdirectlyprovidedbytransformingtheemergingSolrresponses,there’snoneedforanexternalwebapplicationtoexecutesearchesandgraphicallyseethecorrespondingresults.

Okay,onecouldnowsay,“ThisisalreadypossiblewiththeSolrRESTservices”,butthatisdefinitelymoretechnicallycomplexandthesearchresultsaredisplayedinXMLorJSONorwhateverformat.Here,amoreuser-friendlyinterfaceisprovided,asshowninthefollowingscreenshot:

www.it-ebooks.info

Page 274: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ThatmakesSolritasanidealchoicetobuildrapidprototypes.ThesampleinstanceyoustartedatthebeginningofthischapterhasSolritasconfiguredinsolrconfig.xml.Itrespondstothe/solritasendpoint,soafterindexingsomedatafromthepreviousparagraph,openyourbrowserandtypehttp://127.0.0.1:8983/solr/example/solritas.

TipTheVelocitytemplateshavebeencopiedfromtheSolrdownloadbundle,sosomeareas(suchasGoogleMapswidgets,spatialqueries,andrangequeries)mightnotbevisibleormightnotmakesensewiththechapter’ssampledata.Ifyouwanttoseealloftheminaction,juststarttheSolrexampleinthedownloadbundleandnavigatetohttp://127.0.0.1:8983/solr/browseaddress.

YoushouldseeSolritas’resultspage,whichispreloadedwitha*:*querybydefault.

www.it-ebooks.info

Page 275: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 276: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OtherextensionsThecontribfoldercontainsothermodulesorpluginsthatarebrieflydescribedinthefollowingsections.

www.it-ebooks.info

Page 277: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

ClusteringTheclusteringmoduleisaframeworkusedtopluginthird-party(clustering)implementations.Atthetimeofwritingthisbook,itprovidessupportforclusteringsearchresultsusingtheCarrot2project.

TheSolrexamplethatcomeswiththedownloadbundlealreadycontainsaClusteringComponentwithinthesolrconfig.xmlconfigurationfile.Thedeclarationhappensintwophases.First,thecomponenthastobeconfigured:

<searchComponent

name="clustering"

enable="${solr.clustering.enabled:false}"

class="solr.clustering.ClusteringComponent">

<lstname="engine">

<strname="name">lingo</str>

<str

name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorit

hm</str>

<strname="carrot.resourcesDir">clustering/carrot2</str>

</lst>

</searchComponent>

Afterthis,aswithanyotherSearchComponent,youshouldenableitbyincludingitsnameintheRequestHandlerinstancewhereitissupposedtoplay:

<requestHandlername="/myRequestHandler"class="solr.SearchHandler">

<arrname="last-components">

<str>clustering</str>

</arr>

</requestHandler>

Inthisway,itcancontributetosearchresultsbyaddinga“clusters”section,likethis:

<response>

<result>

</result>

<arrname="clusters">

<arrname="labels">

<str>iPod</str>

</arr>

<doublename="score">1.3174612693376382</double>

<arrname="docs">

<str>F8V7067-APL-KIT</str>

<str>IW-02</str>

</arr>

<arrname="labels">

<str>HardDrive</str>

</arr>

</response>

www.it-ebooks.info

Page 278: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ifyouwanttotrythisyourself,openashellandtypethefollowingcommands:

#cd$INSTALL_DIR/example

#java-Dsolr.clustering.enabled=true-jarstart.jar

ThesewillstartSolrwiththeClusteringComponentenabled.Now,onanothershelltypethis:

#cd$INSTALL_DIR/example/exampledocs

#./post.sh*.xml

Finally,openabrowserandexecutethisquery:http://localhost:8983/solr/clustering?q=*:*&rows=10

Youshouldgetaresponsesimilartotheprecedingexample,withthe“clusters”sectionatthebottom.

www.it-ebooks.info

Page 279: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UIMAMetadataExtractionLibraryThismoduleintegratesApacheUIMAinSolrbyprovidingapowerfulMetadataExtractionLibrarythatcanbeusedfortaskssuchasautomatickeywordextractionandNamedEntityRecognition(forexample,places,names,concepts,anddates).

TheplugincanbeprovidedbothasanUpdateRequestProcessorsubclass,todecoratetheindexprocesschain,orasasetofTokenizers/Filters,toaddsuchbehaviorinthe(indexorquery)textanalysisphase.

Usingthismodule,youcanenrichyourSolrdocumentswithadditionalmetadatainformationextractedfromtheinputdata.UIMAprovidesananalysisenginethatinvolvesseveralcomponentsarrangedinapipeline.ThedefaultpipelinesupportstheuseofexistinganalysisenginessuchasAlchemyorOpenCalais.Keepinmindthattheseenginesarenotfree-of-charge,buttheyprovideafreetrialperiod.YoucanregisterandobtainanAPIkeythatmustbeconfiguredinthesolrconfig.xmlfile.Othercomponentsareusedforlanguageandsentencedetection.

NoteUnderthecontrib/uimafolder,youwillfindaREADMEfilewithdetailedinformationabouttheSolrUIMAmoduleusage.

TheUIMAUpdateRequestProcessorinterceptsthedocumentsthatarebeingindexedandsendsthemtoitsanalysispipeline.Thosedocumentswillbeautomaticallyenrichedwithextractedinformationsuchassentences,languages,ornamedentities(forexample,placesornames).

www.it-ebooks.info

Page 280: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MapReduceTheMapReducecontribmoduleprovidesintegrationwithApacheHadoop.MapReduceisthenameofaparadigm(programmingmodel)thatisimplementedinApacheHadooptoprocesslargedatasetswithaparallelanddistributedalgorithm.

ThecontributioncontainsaMapReducejobtobuildSolrindexesandmergethemintoaSolrcluster.

www.it-ebooks.info

Page 281: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 282: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthischapter,weillustratedasetofcontributionmodulesthatarenotpartoftheSolrcorebutdefinitelyusefulinalotofrealscenarios.TheSolrdownloadbundlecontainsallofthem,andtheirinstallationisveryeasy.EachmodulefolderhasaREADMEfilethatguidesyouthroughinstallationandsetupsteps(basically,it’sjustamatterofcopying,pasting,andconfiguring).

Inthenextchapter,wewillconcludeourSolrpathwithanoverviewabouttheSolrcodebase.Youwilllearnhowtoworkwithitandeventuallyhowtocontributetotheopensourcecommunityprocess.

www.it-ebooks.info

Page 283: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 284: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Chapter8.ContributingtoSolrAfriendofmineusedtosay,“Isthereabetterwaytostartanewyearthancontributingtoanopensourceproject?”Istronglyagree;agreatwaytogetinvolvedintheopensourceworldistocontributetotheprojectsyou’reusing.

Beingauserofanopensourcesoftware,youarealreadypartofthatworld—animportantpartthatmakesthatsoftwareuseful.Butthere’smore;youcandelvemoredeeplyintowhatactuallyhappensbehindthescenes.

Bytheendofthischapter,youwillhaveagoodunderstandingofthefollowingtopics:

TheconstituentpiecesoftheopensourceworldTheApachecontributionprocessHowtoworkwithSolrsourcecodeinyourIDE

www.it-ebooks.info

Page 285: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

IdentifyingyourneedsWhyareyouinterestedintheopensourcecontributionprocess?WhydoyouwanttohavetheSolrsourcecodeinyourIDE?Thesearecrucialquestionsyoushouldanswerbeforedoingallthatisdescribedinthischapter.Inmyopinion,youcouldfallunderoneofthesescenarios:

Curiosity:Youwanttoinspectandseewithyoureyeshowthingsareworkingbehindthescenes.Bugfixing:YouwanttofixabugthatyoumetinyourSolrinstallation.Inthiswayyou,willsatisfyyourcustomerandthecommunitywillbenefitfromyourwork.Improvement:You’vegotanideaaboutaninterestingfeaturenotyetimplemented.Probably,acustomerrequirementledtothatidea,andyoubelievethatitcouldbeusefulforotherusersif(onceimplemented)itwouldbeintegratedinSolr.Wantingtocontribute:Yousimplywanttocontributebyfixinganexistingissueandparticipatinginthedevelopment/contributionprocess.

Whilecuriositycouldbeagoodreasontostartinvestigatingsourcecode,soonerorlater(andIwouldaddmostprobably),youwillfallintooneoftheothercategories.Atthattime,youwillnecessarilystartcommunicatingwithotherpeopleandthecommunitiesassociatedwiththeproject.

TipYoucanfindageneralintroductionabouttheApachecontributionprocessathttp://www.apache.org/foundation/getinvolved.html.

Thatinteractionwillinvolvesomegeneralaspectssuchasissuetracking,mailinglists,softwaredevelopment,andsoon.Onceyouhaveidentifiedyourneedsandgoals,youcanlookatupcomingsectionstogetadescriptionaboutthosecross-cuttingconcepts.

www.it-ebooks.info

Page 286: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Anexample–SOLR-3191In2013,IwasworkingonanOnlinePublicAccessCatalogue(OPAC)projectforabiglibrary.Theschemadefinitionbecamehugeverysoon,becausetheMARC,thestandardrepresentationforbibliographicrecords,isanoldandprovenstandardthatclassifieseachminimalpieceofinformationaboutacatalogitem.

Obviously,ourcustomerrequiredallthatrichnessinthesearchapplication,sowestartedwithasmallschemaandquicklyendedupwithalotoffields.

AnotherrequirementwasthecapabilitytodownloadeachiteminMARCXMLformat(MARCXMListheXMLrepresentationofaMARCrecord)intheenduserapplication.So,inordertosatisfythatrequirement,weputthewholeMARCrepresentationinadedicatedstoredfieldcalled,notsurprisingly,marc_xml.

Whatwastheproblem?OntheSolrside,wedefinedalotofSearchHandlerinstances,oneforeachkindofsearch(forexample,anykeyword,author,title,orsubject).Asyouknow,foreachhandlerwehavetodeclareall(stored)fieldsthatmustbeinthesearchresultsusingtheflparameter.

Inthefirstapproach,wesimplyputawildcard(*)asavaluefortheflparameter,asmostpartsofthosefieldswereneededintheuserinterface.Butafterithadbeenrunningforawhileinproduction,theITdepartment,inchargeofmonitoringthesystem,raisedanissueaboutthenetworktrafficbetweenthefrontendapplicationandtheSolrserver.Afterdoingsomeanalysis,wediscoveredalotofrecordswithahugemarc_xmlfieldreturnedtotheclient.“Ok,”saidoneoftheITguystous,“justexcludethemarc_xmlfieldfromtheflparameter”.

Theflparameteracceptsalistoffieldsthatmustbereturned,butthere’snowaytotellitwhatmustnotbeinthesearchresults.Eighthandlersweredefinedinthesolrconfig.xmlfile,andforeachofthem(later,wediscoveredtheXIncludefeature,butthat’sanotherstory),wehadtodeclareallstoredfields,excludingthemarc_xmlfield.Thiswasterribleandunmaintainable!

Aftergooglingabit,Ifoundseveralguysfacingthesameproblem,soIdecidedtotakealookatanexistingJIRAissue.Thus,Imetthe(unsolved)SOLR-3191issueathttps://issues.apache.org/jira/browse/SOLR-3191,whichdescribestheproblem:

SOLR-3191fieldexclusionfromfl

IthinkitwouldbeusefultoaddawaytoexcludefieldfromtheSolrresponse.IfIhaveforexample100storedfieldsandIwanttoreturnallofthembutone,itwouldbehandytolistjustthefieldIwanttoexcludeinsteadofthe99fieldsforinclusionthroughfl

SoIthoughttomyself:whydon’tyoutrytoimplementthatfeature?AndIdidwhatI’mgoingtodescribeinthischapter.Ifyoutakealookatthatissue,youwillseeIsubmittedtwopatchesandhadsomeexchangewithacoupleofSolrguys.

www.it-ebooks.info

Page 287: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 288: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SubscribingtomailinglistsIfyouhaven’tsubscribedtoaSolrmailinglist(orlists)yet,youshoulddothatbeforegoingahead.Useranddeveloperlistsaretheprimaryplacewherethingssuchasdoubts,questions,features,andbugsarediscussed.

It’smainlytherethatyoushouldlooktosolveyourproblemandmeetpeoplewithsimilarrequirements.LikeanyotherApacheproject,Solrhasthefollowingmailinglists:

Auserlist–[email protected][email protected][email protected]

EverySolrusershouldbesubscribedtotheuserlist.Thisusuallyavoidstheneedtoreinventthewheelbygettingideasandsolutionsfromusersanddevelopers.

ThedevlistismeantforlisteningorparticipationindiscussionsonLuceneandSolrinternals,developments,upcomingfeatures,andsoon.Thefocushereismoretechnical.

Finally,thecommitslistisusedtoreceivenotificationsabouteverySolrorLucenecommit.

Subscribingtoalistisveryeasy;[email protected],[email protected],[email protected],andthenfollowtheprocedurewrittenintheansweringmail.

www.it-ebooks.info

Page 289: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 290: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SigninguponJIRATheissuetrackerisanotherimportantbuildingblockoftheopensourcecontributionprocess.Wheneveranidea,question,bug,orfeaturebecomessomethingthatcouldaffectthecode,anewJIRAissueisfilled,andallthingsrelatedtothat(forexample,tasks,discussions,patches,code,andcommitlogs)willbeputthere.

IssuesinJIRAarepublic,soifyouwanttoonlyseeorreadthemthere’snoneedtohaveanaccount(youshouldhavealreadyreadtheSOLR-3191issueonJIRA,withouthavinganaccount).

However,ifyouwanttoparticipateinadiscussion,postapatch,orcreateorupdateissues,youmustsignupathttps://issues.apache.org/jira/secure/Signup!default.jspa.

Ultimately,youcansigninusingtheloginformathttps://issues.apache.org/jira/login.jsp.

That’sall!WelcometotheApacheIssueTracker!Notethat,beforeopeninganewissue,itisalwaysbettertopingthedevlistanddiscussit.Maybe,asimilarissuealreadyexistsandsomeoneisworkingonit.

www.it-ebooks.info

Page 291: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 292: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SettingupthedevelopmentenvironmentFollowingthesamelogicthatwasusedinthepreviouschapters,IwillassumeyouhaveEclipseinstalled.Ifthatisnotthecase,thatis,ifyoufollowedtheexamplesusingsomeotherIDE(forexample,IntelliJ),afewstepscouldbeabitdifferent.

Inordertobeabletomodify,build,andrunSolrfromthesourcecode,youneedthefollowing:

AnIDEsuchasEclipseorIntelliJASubversionclient,whichcanbeastandaloneclient(suchasthesvncommand-linetoolorTortoiseSVN)oraplugininyourIDE(forexample,SubclipseorSubversive)ApacheANT(http://ant.apache.org/bindownload.cgi)

www.it-ebooks.info

Page 293: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

VersioncontrolSubversionisanopensourceversioncontrolsystemthatisusedtomaintainthesourcecodeoftheApacheprojects,includingSolr.

Asafirststep,youneedtocheckouttheSolrsourcecodefromtheSVNrepository.Dependingonyourrole,youshouldpointtooneofthefollowingaddresses:

http://svn.apache.org/repos/asf/lucene/dev/<branch>

https://svn.apache.org/repos/asf/lucene/dev/<branch>

Asyoucansee,theonlydifferenceintheprecedinglinksisintheprotocol.Thefirstlink,whichuseshttp,isforanonymouscheckout,andtheother,whichuseshttps,isforcommitters.Committersarethosepeoplewhohavecommitrights,thatis,activemembersofthedevelopmentcommunitywithwritepermissionsontherepository.Iassumeyoudon’tfallwithinthislastcategory,sothecorrectlinkisthefirst.

Thelinkalsocontainsa<branch>placeholder.Thismustbereplacedwiththecorrecttargetversionyouwillworkon.Thatstrictlydependsonthetaskyouwouldliketodo.Ifyouwanttofixabuginapastversion(forexample,4.7.2),youshouldpointtothecorrespondingbranch.Ifyouwanttopickupanexistingenhancementorbugthathasbeenscheduledforthenextmajorrelease,youshouldpointtothe“trunk”leg.Thefollowingtabledescribeshowtherepositorytreeisorganized(http://svn.apache.org/repos/asf/lucene/dev/):

Folder Description

branches Developmentbranches.

branches/branch_5x Thedevelopmentbranchforthenextversion,5.x.

branches/Lucene_solr_3_6

branches/Lucene_solr_4_10

Thedevelopmentbranchesforversionsthathavebeenreleased.Apartfromsometasksthathavebeenscheduledforagivenrelease,mostofthedevelopmentactivitiesdoneinthesebranchesarebugfixes.

tags

Whenanewversionisreleased,thecorrespondingsourcecodeiscopiedhere,inadedicatedfolder(forexample,tags/lucene_solr_3_6_1andtags/lucene_solr_4_10_3).

trunk Thisisthemaincenterofdevelopment.

Thetargetbranchdependsonwhatyouwouldliketodo.IfyoupickupanexistingJIRAamongitsattributes,youwillalsofindtheaffectedversion.Besides,youmaywanttofixanissueinanolderversion(forexample,3.6.1)becauseyourcustomerisusingthatspecificversion.

Keepinmindthatmostdevelopmenttasksaredoneinthetrunkandthenreportedtothecorrespondingactivedevelopmentbranch(underthebranchesfolder).Anyway,beforestarting,itisalwaysrecommendedtopingthedevlistexplainingwhatyouwanttodo.

www.it-ebooks.info

Page 294: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CodestyleOneofthecommonproblemsinadistributeddevelopmentistheagreementaboutsourcecodeformalisms:comments,namingconventions,andsoon.

That’sthereasontheSolrdevelopmentteamprovidedtwousefulconfigurationfiles—oneforEclipseandanotherforIntelliJ.ThesefilescanbeimportedtothoseIDEstoautomatealotofthingssuchasindentation,bracespositions,linewrapping,comments,andsoon.

Pickupthatfilefromoneofthefollowingaddresses,dependingonyourfavoriteIDE:

Eclipse:http://people.apache.org/~rmuir/Eclipse-Lucene-Codestyle.xmlIntelliJ:http://people.apache.org/~erick/Intellij-Lucene-Codestyle.xml

InEclipse,theconfigurationfilecanbeimportedbygoingtoWindow|Preferences|Java|CodeStyle|FormatterandthenclickingontheImportbutton,asshowninthefollowingscreenshot:

Afterthat,navigatetoJava|Editor|SaveActions.SelectthePerformtheselectedactionsonsavecheckboxandtheFormateditedlinesradiobutton,asshowninthisscreenshot:

www.it-ebooks.info

Page 295: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 296: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CheckingoutthecodeOnceyouhaveidentifiedthetargetbranchtoworkon,checkoutthesourcecodeusingthesvncommand-linetooloryourfavoritetool(forexample,TortoiseSVN).

SOLR-3191wasconsideredanewfeatureatthattime,soIcheckedoutthetrunk.ThecurrenttrunkrequiresJava8inordertobuildso,toexecutethestepsneededinthischapter,let’spointtoadifferentbranch(5_x).Openashellandtypethefollowingcommand:

#cd/work/solrdev

#svncheckout

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_5xsolr_5

Bearinmindthefollowing:

I’mnotacommitter,soIpointedtotheread-only(http)address.Thenameofthelocalfolderthatwillcontainthedownloadedsourceissolr_5.Ifitdoesn’texist,itwillbeautomaticallycreated.The/work/solrdev/solr_5folderisalocalworkingfolderonmymachine.Youcanchoosewhatevernameyoulike.

Whenyouexecutethatcommand,alotoffileswillbedownloaded.Intheend,youshouldseesomethinglikethis:

Asolr_5/solr/test-framework/src/java/overview.html

Asolr_5/.hgignore

Usolr_5

Checkedoutrevision1651057.

NowthesourcecodeofSolr5_xisinyourmachine.

www.it-ebooks.info

Page 297: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CreatingtheprojectinyourIDEGettingthesourcecodeisnotenough,unlessyouwanttodevelopyourpatchusingVim.YouwillhavetocreateaprojectinyourIDE.Assumingyouareinthe/work/solrdev/solr_5folderyoucreatedinthepreviousstep,typethefollowing:

#antcleantest

TheantcommandwillimmediatelyfailbecausethebuildrequiresIvy(adependencymanagementtool),andyoudon’thavethatonyourmachine.Noproblem!There’sadedicatedtaskthatcaninstallIvyforyou.Typethiscommand:

#antivy-bootstrap

Youshouldseesomethinglikethis:

ivy-bootstrap2:

ivy-checksum:

ivy-bootstrap:

BUILDSUCCESSFUL

Totaltime:3seconds

Nowwecanretrythefirstcommand:

#antcleantest

Thiswillexecutethewholetestsuite,whichisveryhuge,sotakealongcoffeebreak!

TipAlthoughthisstepisnotmandatory,itisstronglyrecommendedtocheckthestateofyourbuildbeforemakinganychange.Inthisway,youcanseewhetherthere’ssomethingfailing,somethingthatdoesn’thavetodowithyourchanges.

Oncethetestsuitehasbeenexecuted,typethiscommandifyouareusingEclipse:

#anteclipse

IfyouareusingIntelliJ,typethefollowingcommand:

#antidea

ThiswillgeneratetheIDEprojectfileswithinthecurrentdirectory(solr_5).Fromhereon,Iwillassumeyou’reusingEclipse,butthestepsarebasicallythesameforIntelliJ.

OpenEclipseandcreateanewworkspace(youcanalsousetheworkspacewhereyouloadedthesampleprojectsofthisbook).

OpentheFilemenuandchooseImport.Fromthedialogthatappears,gotoGeneral|ExistingProjectsintoWorkspace.UsingtheBrowsebutton,selectthe/work/solrdev/solr_5folder.PressOkandthenConfirm.Thedialogwillcloseandtheprojectwillbeimported,asshowninthisscreenshot:

www.it-ebooks.info

Page 298: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Oncetheprojecthasbeenbuilt,youshouldn’thaveanyerrors.Everythingisready,andyoucanproceedwithyourchange.

www.it-ebooks.info

Page 299: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 300: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MakingyourchangesWewon’tdigverydeepinthisstepbecauseitbasicallydependsonthenatureofthetaskyoupickedup.Forinstance,mySOLR-3191patchcontainsfourexistingclassesthatIchangedtoimplementthatspecificbehavior.

Sincenobodyknowsyouandyourchangeswillbehopefullyintegratedinaverypopularframework,themostimportantthingstokeepinmindareasfollows:

Correctness:Theimplementationmustdowhatitissupposedtodo,accordingtotherequirementsexpressedintheJIRAissueDocumentation:Javadocatclassandmethodlevels(don’tincludethe@authortag)Unittests:Thesedescribeandvalidateyourchanges

ReturningtotheSOLR-3191example,Ichangedtwoclasses:

org.apache.solr.search.ReturnFields

org.apache.solr.search.SolrReturnFields

Theseclassescontainthelogicrequiredbytheissue.Atthesametime,IupdatedtwoTestCaseclasseswithseveralunittestsdemonstratingandvalidatingmychanges:

org.apache.solr.search.ReturnFieldsTest

org.apache.solr.search.TestPseudoReturnFields

Duringdevelopment,it’sbettertoperiodicallyexecutethetestsuite,inordertoensurethatyourchangesdidn’tintroduceanyside-effect.

TipWhenworkinginadistributeddevelopmentenvironment,itisstronglyrecommendedyourunansvnupdatecommandfrequently.Inthisway,youwillalwaysbeworkingwiththelatestversionofthebranchyoucheckedout.

Okay,takeyourtimeandmakeyourchanges.RemembertopostamessageintheissuepageinJIRAforeveryrelevantdoubt.Inthisway,allofthehistoryofyourworkwillbeinoneplace.

www.it-ebooks.info

Page 301: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 302: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

CreatingandsubmittingapatchOncetheimplementationhasbeencompleted,everythingisworking,andthetestsaregreen,it’stimetosubmitthepatch.

Beforedoingthat,openashellonthe/work/solrdev/solr_5workingfolderandtypethis:

#antprecommit

Thistaskwilllookforproblemsrelatedtotabindentation,authortags,andbrokenorwronglinksinjavadoc.Attheend,typethefollowingcommand:

#svnstat

Youwillseealistofsourcefilesthathavebeenchanged.Ifallofthemareassociatedwithyourchanges,justtypethiscommandinordertoincludetheminthepatch:

#svnstat|grep"^?"|awk'{print$2}'|xargssvnadd

Alternatively,youcanaddthosefilesonebyone,usingthefollowingcommand:

#svnadd<file>

Finally,typethiscommandtogenerateapatch:

#svndiff>/work/patches/SOLR-XXXX.patch

Thatwillcreateanewfile(SOLR-XXXX.patch)underthe/work/patcheslocalfolder.Hereareacoupleofthingstonote:

/work/patchesisasamplelocaldirectorythatI’vecreatedonmymachine.Youcanputthepatchinadifferentfolder.XXXXissupposedtobereplacedwiththenumberofthecorrespondingJIRAissue.Ifyouareupdatinganexistingpatch,thenameshouldalwaysfollowthisconventionbecauseJIRAwilltakecareofhighlightingthenewestversion.

TipIfyou’veinstalledanSVNpluginonyourIDE(suchasSubclipseorSubversiveinEclipse),youcandoeverythingwithoutusingthecommand-line.InSubclipse,forexample,there’saCreatePatchunderTeamthatwillguideyouthroughthenecessarystepswithaneasywizard.

Onceyou’vegotthepatchfile,openabrowser,logintoJIRA,gototheissuepage,anduploadthepatch.Itisrecommendedyoupostacommentwithinformation(includingadescription)aboutyoursubmission.That’sall!Nowyoushouldfollowyourissuebecauseseveralthingscanhappen:

Thepatchisperfect,soit’sjustamatteroftimeanditwillbeapplied.SomequestionscomefromJIRAusers.Inthatcase,youmaywanttoparticipateinadiscussionthatmighteventuallyrequestanewversionofthepatch.

www.it-ebooks.info

Page 303: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Anyway,thebigpartisdone!You’veactivelyparticipatedinthecontributionprocess,andhopefullyyourartifactwillbeintegratedwithSolr.Congrats!

www.it-ebooks.info

Page 304: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 305: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OtherwaystocontributeBesideswritingcode,thereareotherwaystoparticipateinanopensourceproject.Afterall,thesoftwareisjustacomponentofafinalproduct.Wecanfindsupportanddocumentation,whichinmostcasesmaketherealdifferencebetweenagoodandabadproductfromtheuser’sperspective.

www.it-ebooks.info

Page 306: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DocumentationSoftwarequalityisdescribedbyacombinationofseveralfactors:functionalandnon-functionalfeatures,internalandexternalqualities,andlastbutnotleast,documentation.

By“documentation”,Ipersonallymeanacomplexandhugeworldmadeupofdifferenttypesofinformationfordifferenttypesoftargetaudience:

Technicalinternaldocumentation:Strictlyneededbyactivedeveloperstoinformaboutthestructureortheimplementationofthesystem.Technicalexternaldocumentation:Crucialforopensourceprojectsrepresentingframeworks,thingsthatcanbeextended.Thisissometimescalledthedeveloperguide.ThiskindofinformationdocumentsthepublicAPIandtheextensionpointsthatletdevelopersintegratetheproductwiththeirapplications.Userdocumentation:Thisenablesenduserstounderstandtheusageandpowerofagivensystem.Itissometimescalledauserguideandistheprimarysourceofinformationforanenduser.

Solrhastwomainplaceswheredocumentationcanbefound:

Thereferenceguide,availableonlineathttps://cwiki.apache.org/confluence/display/solr/About+This+Guide,orinPDFformatTheSolrcommunityWiki,athttps://wiki.apache.org/solr

Thefirstisaguideconstitutingtheofficialreferencedocumentation.ItiscreatedandmaintainedbySolrcommitters.Ontheotherhand,theWikiisapublicandcollaborativetool.AnyonecanpotentiallyedititscontentbycreatinganaccountandthenrequestingwritegrantsfromtheSolrteam.Fordetailedinstructionsrefertohttp://wiki.apache.org/solr/#How_to_edit_this_Wiki.

www.it-ebooks.info

Page 307: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MailinglistmoderatorAlistmoderatorisakindofsupervisorforagivenmailinglistandauserwithelevatedprivileges.Hecangetalistofallsubscribersandmanuallysubscribeorunsubscribeagivenuser.

Hechecksemailssenttothelistfromaddressesthatarenotsubscribedinordertoimprovespamfilterrules.Healsohelpsuserswhofaceissuesrelatedwithlists(forexample,subscriptionandun-subscription).

www.it-ebooks.info

Page 308: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

www.it-ebooks.info

Page 309: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

SummaryInthisfinalchapter,weillustratedtheoverallcontributionprocess.Beinganopensourceproject,theSolrteamwarmlywelcomesanykindofcontribution:sourcecode,bugfixing,documentation,andactiveparticipationinthemailinglists.There’snoneedtobeacommitter,whichwouldbesurelyanambitiousgoalforadeveloper.It’salwayspossibletodownloadthesourcecode,changeit,andeventually(ifyouthinkthechangescouldalsobeusefulforotherpeople)createapatchandsubmitittothecommunity.

www.it-ebooks.info

Page 310: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

IndexA

addcommandabout/Addsending/Sendingaddcommands

addcommand,XMLformat<add>/AddcommitWithin/Addoverwrite/Add<doc>/Addboost/Add<field>/Add

Alchemyabout/UIMAMetadataExtractionLibrary

alternativequery/Alternativequeryanalyzersections/ThetextanalysisprocessApacheANT

URL/Settingupthedevelopmentenvironmentversioncontrol/Versioncontrol

ApachecontributionURL/Identifyingyourneeds

ApacheHadoopabout/MapReduce

ApachePOI/ContentExtractionLibraryApacheTikaframework/ContentExtractionLibraryApacheUIMA

about/UIMAMetadataExtractionLibraryApacheVelocity

about/RapidprototypingwithSolaritasApacheZookeeper

about/ClustermanagementURL/Clustermanagement

autocommitfeature/Updatehandlerandautocommitfeature

www.it-ebooks.info

Page 311: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Bbackgroundserver

Solr,runningas/DifferentwaystorunSolr,Backgroundserverbackup

about/Replicationfactor,leaders,andreplicasBooleanfields

about/BooleanBooleanparameters,servicebehavior

waitSearcher/Commit,optimize,androllbackwaitFlush/Commit,optimize,androllbacksoftCommit/Commit,optimize,androllback

Boostqueryparser/Otheravailableparsersbuilt-intransformers

ScriptTransformer/TransformersDateFormatTransformer/TransformersHTMLStripTransformer/TransformersLogTransformer/TransformersNumberFormatTransformer/TransformersRegexTransformer/TransformersTemplateTransformer/Transformers

www.it-ebooks.info

Page 312: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ccache

about/CachesFilterCache/CachesQueryResultCache/CachesDocumentCache/CachesFieldCache/CachesFieldValueCache/CachesCustomCache/Cacheslifecycle/Cachelifecyclessizing/Cachesizingobjectslifecycle/CachedobjectlifecycleLRUCache/CachedobjectlifecycleFastLRUCache/CachedobjectlifecycleLFUCache/Cachedobjectlifecyclestats/Cachestatstypes/Typesofcache

cache,statslookups/Cachestatshits/Cachestatshitratio/Cachestatsinserts/Cachestatsevictions/Cachestatssize/CachestatswarmupTime/Cachestatscumulative_lookups/Cachestatscumulative_hits/Cachestatscumulative_hitratio/Cachestatscumulative_inserts/Cachestatscumulative_evictions/Cachestats

cache,typesfiltercache/Filtercachequeryresultcache/QueryResultcachedocumentcache/Documentcachefieldvaluecache/Fieldvaluecachecustomcache/Customcache

Carrot2projectabout/Clustering

changescreating/Makingyourchanges

charfilters/Charfiltersreferencelink/Charfilters

clusteringmodule

www.it-ebooks.info

Page 313: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

about/ClusteringCollectionsAPI,actions

CREATE/CollectionsAPIRELOAD/CollectionsAPIDELETE/CollectionsAPILIST/CollectionsAPICREATESHARD/CollectionsAPISPLITSHARD/CollectionsAPIDELETESHARD/CollectionsAPICREATEALIAS/CollectionsAPIDELETEALIAS/CollectionsAPIADDREPLICA/CollectionsAPIDELETEREPLICA/CollectionsAPICLUSTERPROP/CollectionsAPIMIGRATE/CollectionsAPIADDROLE/CollectionsAPIREMOVEROLE/CollectionsAPIOVERSEERSTATUS/CollectionsAPICLUSTERSTATUS/CollectionsAPIREQUESTSTATUS/CollectionsAPIADDREPLICAPROP/CollectionsAPIDELETEREPLICAPROP/CollectionsAPIBALANCESHARDUNIQUE/CollectionsAPI

configurationparametersURL/ContentExtractionLibrary

ContentExtractionLibrary/ContentExtractionLibrarycopyfields/CopyfieldsCore

overview/CoreoverviewCoreAdmin

about/CoreAdmintoptoolbar/CoreAdmincentralarea/CoreAdmin

CoreAdmin,centralareastartTime/CoreAdmininstanceDir/CoreAdmindataDir/CoreAdminlastModified/CoreAdminversion/CoreAdminnumDocs/CoreAdminmaxDocs/CoreAdmindeletedDocs/CoreAdminoptimized/CoreAdmincurrent/CoreAdmin

www.it-ebooks.info

Page 314: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

directory/CoreAdminCoreAdmin,toptoolbar

Unload/CoreAdminRename/CoreAdminSwap/CoreAdminReload/CoreAdminOptimize/CoreAdmin

customcache/Customcachecustomdata

indexing/Indexingcustomdatacustomresponsewriter

using/Usingacustomresponsewriter

www.it-ebooks.info

Page 315: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

DDamerau-Levenshteindistancealgorithm/Fuzzydashboard

about/DashboardphysicalandJVMmemory/PhysicalandJVMmemorydisk/Diskusagefiledescriptors/Filedescriptors

databaserecordversusdocument/Thedocument

DataImportHandlermoduleabout/DataImportHandlerdatasources/Datasourcesentities/Documents,entities,andfieldsdocuments/Documents,entities,andfieldsfields/Documents,entities,andfieldstransformer/Transformersentityprocessors/Entityprocessorseventlisteners/Eventlisteners

datasourcesabout/DatasourcesJdbcDataSource/DatasourcesURLDataSource/DatasourcesBinURLDataSource/DatasourcesFileDataSource/DatasourcesBinFileDataSource/DatasourcesContentStreamDataSource/DatasourcesBinContentStreamDataSource/DatasourcesFieldReaderDataSource/DatasourcesFieldStreamDataSource/Datasources

dateformatabout/Date

defaultsimilarity/Defaultsimilaritydeletecommands

issuing/Deletedevelopmentenvironment

settingup/Settingupthedevelopmentenvironmentversioncontrol/Versioncontrolcodestyle/Codestylecode,checkingout/Checkingoutthecodeprojectcreating,inIDE/CreatingtheprojectinyourIDE

diamondarchitectureabout/Master/slavesscenario

Dis

www.it-ebooks.info

Page 316: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

about/TheDisjunctionMaximumqueryparserMax/TheDisjunctionMaximumqueryparser

disjunctionmaxquery/Tiebreakerdisjunctionsumquery/TiebreakerDisMaxqueryparser

about/TheDisjunctionMaximumqueryparserqueryfields/QueryFieldsalternativequery/Alternativequeryminimumnumberofmatches/Minimumshouldmatchphrasefields/Phrasefieldsqueryphraseslop/Queryphraseslopphraseslop/Phraseslopboostqueries/Boostqueriesadditiveboostfunctions/Additiveboostfunctionstieparameter/Tiebreaker

Document/Inputandoutputdatatransferobjectsdocument

about/Thedocumentversusdatabaserecord/Thedocument

documentationabout/Documentationtechnicalinternaldocumentation/Documentationtechnicalexternaldocumentation/Documentationuserdocumentation/Documentation

documentcacheabout/Documentcache

documentsabout/Documents,entities,andfields

dynamicfields/Dynamicfields

www.it-ebooks.info

Page 317: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

EEclipse

URL/CodestyleEclipseIDEforJavaDevelopers

URL/PrerequisiteseDisMaxqueryparser

about/TheExtendedDisjunctionMaximumqueryparserfieldedsearch/Fieldedsearchphrasebigramfield/Phrasebigramandtrigramfieldsphrasetrigramfield/Phrasebigramandtrigramfieldsphrasetrigramslop/Phrasebigramandtrigramslopphrasebigramslop/Phrasebigramandtrigramslopmultiplicativeboostfunction/Multiplicativeboostfunctionuserfields/Userfieldslowercaseoperators/Lowercaseoperators

ensembleabout/Clustermanagement

entitiesabout/Documents,entities,andfieldsrootentities/Documents,entities,andfieldssubentities/Documents,entities,andfields

EntityProcessorabout/Entityprocessors

entityprocessorsSqlEntityProcessor/EntityprocessorsFileListEntityProcessor/EntityprocessorsLineEntityProcessor/EntityprocessorsMailEntityProcessor/EntityprocessorsPlainTextEntityProcessor/EntityprocessorsSolrEntityProcessor/EntityprocessorsTikaEntityProcessor/EntityprocessorsXPathEntityProcessor/Entityprocessors

eventlistenersabout/Eventlisteners

extensionsabout/Otherextensionsclusteringmodule/ClusteringUIMAMetadataExtractionLibrary/UIMAMetadataExtractionLibraryMapReduce/MapReduce

www.it-ebooks.info

Page 318: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ffacetcomponent

about/Facetfacetqueries/Facetqueriesfacetfields/Facetfieldsfacetranges/Facetrangespivotfacets/Pivotfacetsintervalfacets/Intervalfacets

facetedsearch/Facetfacetfields/Facetfields

facet.field/Facetfieldsfacet.prefix/Facetfieldsfacet.sort/Facetfieldsfacet.limit/Facetfieldsfacet.offset/Facetfieldsfacet.mincount/Facetfieldsfacet.missing/Facetfieldsfacet.method/Facetfieldsfacet.threads/Facetfields

facetqueries/Facetqueriesfacetranges

about/Facetrangesfacet.range/Facetrangesfacet.range.start/Facetrangesfacet.range.end/Facetrangesfacet.range.gap/Facetranges

facets/FacetFactoryclass/ChangingthestoredvalueoffieldsFastLRUCache/Cachedobjectlifecyclefastvectorhighlighter/Fastvectorhighlighterfieldedsearch/Fieldedsearchfieldlists/FieldlistsFieldqueryparser/Otheravailableparsersfields

about/Documents,entities,andfieldsfields,Solrschema

about/Fieldsstatic/Staticfieldsdynamic/Dynamicfieldscopy/Copyfields

fieldsattributes,Solrschemaname/Fieldstype/Fields

www.it-ebooks.info

Page 319: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

indexed/Fieldsstored/Fieldsrequired/Fieldsdefault/FieldssortMissingFirst/FieldssortMissingLast/FieldsomitNorms/FieldsomitPositions/FieldsomitTermFreqAndPositions/FieldstermVectors/FieldsdocValues/Fields

fieldtypes,Solrschemaabout/Fieldtypestextanalysisprocess/Thetextanalysisprocesscharfilters/Charfilterstokenizer/Tokenizerstokenfilters/Tokenfiltersimplementing/Puttingitalltogetherreferencelink/Someexamplefieldtypes

fieldtypesattributes,Solrschemaname/Fieldtypestype/FieldtypessortMissingFirst/FieldtypessortMissingLast/Fieldtypesindexed/Fieldtypesstored/FieldtypesmultiValued/FieldtypesomitNorms/FieldtypesomitTermsAndFrequencyPositions/FieldtypesomitPositions/FieldtypespositionsIncrementGap/FieldtypesautogeneratePhraseQueries/Fieldtypescompressed/FieldtypescompressThreshold/Fieldtypes

fieldtypesexamples,Solrschemaabout/Someexamplefieldtypesstring/Stringnumeric/NumbersBooleanfields/Booleandate/Datetext/Textcurrency/Othertypesbinary/Othertypesgeospatialtypes/Othertypes

www.it-ebooks.info

Page 320: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

random/Othertypesfieldvaluecache/Fieldvaluecachefiledescriptors/Filedescriptorsfiltercache

about/Filtercachefilterqueries/FilterqueriesFirstQueryITCaseintegrationtest/Integrationtestserverflparameter

about/FieldlistsFunctionqueryparser/Otheravailableparsersfuzzyquery/Fuzzy

www.it-ebooks.info

Page 321: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Hhardcommit/Updatehandlerandautocommitfeaturehighavailability

about/Replicationfactor,leaders,andreplicashighlightcomponent

about/Highlightingparameters/Highlightingstandardhighlighter/Standardhighlighterfastvectorhighlighter/Fastvectorhighlighterpostingshighlighter/Postingshighlighter

http/Versioncontrolhttps/Versioncontrol

www.it-ebooks.info

Page 322: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

I<indexConfig>section,attributes

writeLockTimeout/IndexconfigurationmaxIndexingThreads/IndexconfigurationuseCompoundFile/IndexconfigurationramBufferSizeMB/IndexconfigurationramBufferSizeDocs/IndexconfigurationmergePolicy/IndexconfigurationmergeFactor/IndexconfigurationmergeScheduler/IndexconfigurationlockType/Indexconfiguration

IDEproject,creating/CreatingtheprojectinyourIDE

indexedfieldsabout/String

indexingconfigurationabout/Solrindexingconfiguration,Indexconfigurationgeneralsettings/Generalsettingsupdatehandler/Updatehandlerandautocommitfeatureautocommitfeature/UpdatehandlerandautocommitfeatureRequestHandler/RequestHandlerUpdateRequestProcessor/UpdateRequestProcessor

indexoperationsabout/Indexoperationsadd/Adddeletecommands,issuing/Deletecommit/Commit,optimize,androllbackoptimize/Commit,optimize,androllbackrollback/Commit,optimize,androllback

indexprocessextending/Extendingandcustomizingtheindexprocess

integrationtestserverSolr,runningas/DifferentwaystorunSolr,Integrationtestserver

IntelliJURL/Codestyle

intervalfacets/IntervalfacetsInverseDocumentFrequency(IDF)/Shardsinvertedindex

about/Theinvertedindex

www.it-ebooks.info

Page 323: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

JJava

URL,fordownloading/PrerequisitesJavaDevelopmentKit7(JDK)/PrerequisitesJavaproperties

andthreaddump/JavapropertiesandthreaddumpJavaVirtualMachine(JVM)/PrerequisitesJConsole/JMXJIRA

signingup/SigninguponJIRAsigningup,URL/SigninguponJIRAloginform,URL/SigninguponJIRA

JMXabout/JMXURL/JMX

Joinqueryparser/OtheravailableparsersJVisualVM/JMXJVMmemory

andphysical/PhysicalandJVMmemoryJVMoptions

URL/PhysicalandJVMmemory

www.it-ebooks.info

Page 324: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Llanguageidentifier

about/LanguageIdentifierLFUCache/Cachedobjectlifecyclelistmoderator

about/Mailinglistmoderatorloadbalancing

about/Replicationfactor,leaders,andreplicaslogging

about/LoggingLRUCache/CachedobjectlifecycleLuceneindex/FiledescriptorsLucenequeryparser/Otheravailableparsers

www.it-ebooks.info

Page 325: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

MM2Eclipse(M2E)/Prerequisitesmailinglists

subscribingto/SubscribingtomailinglistsManagementBeans(MBeans)/JMXMapReduce

about/MapReduceMARCXML/Anexample–SOLR-3191master/slavescenario

about/Master/slavesscenarioMavenCargoPlugin

URL/Understandingtheprojectstructuremorelikethissearchcomponent

about/Morelikethisparameters/Morelikethis

www.it-ebooks.info

Page 326: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

N1*nrelationship/Documents,entities,andfieldsnumerictype

about/Numbers

www.it-ebooks.info

Page 327: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

OOnlinePublicAccessCatalogue(OPAC)/Anexample–SOLR-3191OnlinePublicApplicationCatalogue(OPAC)/FieldsOpenCalais

about/UIMAMetadataExtractionLibraryoperators

AND/Terms,fields,andoperatorsOR/Terms,fields,andoperators+/Terms,fields,andoperators-/NOT/Terms,fields,andoperators

optimizeabout/Commit,optimize,androllback

www.it-ebooks.info

Page 328: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ppatch

submitting/Creatingandsubmittingapatchcreating/Creatingandsubmittingapatch

PDFBox/ContentExtractionLibraryphrasefields/Phrasefieldspivotfacets/Pivotfacetspostingshighlighter/PostingshighlighterProcessorclass/Changingthestoredvalueoffieldsprojectstructure,Solrdevelopmentenvironment

about/Understandingtheprojectstructuresrc/main/java/Understandingtheprojectstructuresrc/main/resources/Understandingtheprojectstructuresrc/test/java/Understandingtheprojectstructuresrc/test/resources/Understandingtheprojectstructuresrc/dev/eclipse/Understandingtheprojectstructuresrc/solr-home/Understandingtheprojectstructurepom.xml/Understandingtheprojectstructure

www.it-ebooks.info

Page 329: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Qqueryanalyzers/Queryanalyzersqueryfields/QueryFieldsqueryhandlers

about/QueryhandlershandlerStartattribute/Queryhandlersrequestsattribute/Queryhandlerserrorsattribute/Queryhandlerstimeoutsattribute/QueryhandlerstotalTimeattribute/QueryhandlersavgRequestsPerSecondattribute/QueryhandlersavgTimePerRequestattribute/Queryhandlers

queryingabout/Queryingsearch-relatedconfiguration/Search-relatedconfigurationqueryanalyzers/Queryanalyzersqueryparameters/Commonqueryparameters

querylanguageabout/Querying

queryparametersabout/Commonqueryparameters,Queryparametersq/Commonqueryparametersstart/Commonqueryparametersrows/Commonqueryparameterssort/CommonqueryparametersdefType/Commonqueryparametersfl/Commonqueryparametersfq/Commonqueryparameterswt/CommonqueryparametersdebugQuery/CommonqueryparametersexplainOther/CommonqueryparameterstimeAllowed/Commonqueryparameterscache/CommonqueryparametersomitHeader/Commonqueryparametersfieldlists/Fieldlistsfilterqueries/Filterqueriesdefaults/Queryparametersappends/Queryparametersinvariants/Queryparameters

queryparserabout/QueryparsersSolrqueryparser/TheSolrqueryparserDisMaxqueryparser/TheDisjunctionMaximumqueryparser

www.it-ebooks.info

Page 330: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

eDisMaxqueryparser/TheExtendedDisjunctionMaximumqueryparserqueryphraseslop/Queryphraseslopqueryresultcache

about/QueryResultcache

www.it-ebooks.info

Page 331: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Rrangesearches/Rangesrapidprototyping,Solaritas/RapidprototypingwithSolaritasRawqueryparser/OtheravailableparsersRealTimeGetHandler/RealTimeGetHandlerrepeater

about/Master/slavesscenarioreplica

about/Replicationfactor,leaders,andreplicasreplicationfactor

about/Replicationfactor,leaders,andreplicasreplicationmechanism

commit/Master/slavesscenariooptimize/Master/slavesscenariostartup/Master/slavesscenario

repositorytreeURL/Versioncontrol

RequestHandler/RequestHandlerresponseoutputwriters

about/Responseoutputwritersxml/Responseoutputwritersxslt/Responseoutputwritersjson/Responseoutputwriterscsv/Responseoutputwritersvelocity/Responseoutputwritersjavabin/Responseoutputwriterspython/Responseoutputwritersruby/Responseoutputwritersphp/Responseoutputwriters

rollback/Commit,optimize,androllbackroot-entities/Documents,entities,andfields

www.it-ebooks.info

Page 332: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ssampleproject

about/Thesampleprojectschema.xmlfile/schema.xmlschemasections

about/Otherschemasectionsuniquekey/Uniquekeydefaultsimilarity/Defaultsimilarity

search-relatedconfigurationabout/Search-relatedconfigurationsettings/Search-relatedconfiguration

searchcomponentabout/Searchcomponentsquery/Queryfacet/Facethighlight/Highlightingmorelikethis/Morelikethisqueryelevation/Othercomponentsterms/Othercomponentsstats/Othercomponentsspellcheck/Othercomponentstermvector/Othercomponentsdebug/Othercomponents

searchcomponents/Searchcomponentssearchhandler

about/Searchhandlerstandardrequesthandler/StandardrequesthandlerRealTimeGetHandler/RealTimeGetHandler

shardsabout/ShardsURL/Shardsusing/Shardswithreplication/Shardswithreplication

size-estimator-lucene-solr.xlsURL/Prerequisites

softcommit/UpdatehandlerandautocommitfeatureSolidStateDisks(SSD)/DiskusageSolr

latestversion,downloading/DownloadingtherightversionURL,fordownloadbundle/Downloadingtherightversionserver,settingup/Settingupandrunningtheserverserver,running/Settingupandrunningtheserverrunning,asbackgroundserver/DifferentwaystorunSolr,Backgroundserver

www.it-ebooks.info

Page 333: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

running,asintegrationtestserver/DifferentwaystorunSolr,Integrationtestserverabout/Whatdowehaveinstalled?,ExtendingSolrotherresources/Otherresourcesrealtimeandindexeddata,mixing/Mixingreal-timeandindexeddatacustomresponsewriter,using/Usingacustomresponsewriterdata,addingto/Addsanddeletesdata,deleting/Addsanddeletessearchingwith/Searchbindings/Otherbindingsrequirements,identifying/Identifyingyourneedsreferenceguide,URL/DocumentationURL/Documentation

Solr,clientsURL/Otherbindings

SOLR-3191about/Anexample–SOLR-3191URL/Anexample–SOLR-3191

solr-x.y.zdirectory/Settingupandrunningtheserversolr.xml/solr.xmlSolrCloud

about/SolrServer–theSolrfaçade,SolrCloudURL/SolrCloudclustermanagement/Clustermanagementreplicationfactor/Replicationfactor,leaders,andreplicasleaders/Replicationfactor,leaders,andreplicasreplicas/Replicationfactor,leaders,andreplicasdurability/Durabilityandrecoveryrecovery/Durabilityandrecoveryfeatures/Thenewterminologyadministrationconsole/AdministrationconsoleCollectionsAPI/CollectionsAPIdistributedsearch/Distributedsearchcluster-awareindex/Cluster-awareindex

SolrcommunityWikiURL/Documentation

solrconfig.xmlfile/solrconfig.xmlSolrcore

about/TheSolrcoreSolrdatamodel

about/UnderstandingtheSolrdatamodeldocument/Thedocumentinvertedindex/Theinvertedindex

Solrdevelopmentenvironment

www.it-ebooks.info

Page 334: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

settingup/SettingupaSolrdevelopmentenvironmentprerequisites/Prerequisitessampleproject,importing/Importingthesampleprojectofthischapterprojectstructure/Understandingtheprojectstructure

Solrextension,GitHubURL/Commit,optimize,androllback

Solrhomeabout/Solrhome

Solrindex/FiledescriptorsSolritas

about/RapidprototypingwithSolaritasrapidprototyping/RapidprototypingwithSolaritas

Solrjabout/SolrjSolrServer/SolrServer–theSolrfaçadeinputdatatransferobject/Inputandoutputdatatransferobjectsoutputdatatransferobject/Inputandoutputdatatransferobjects

Solrqueryparserabout/TheSolrqueryparserterms/Terms,fields,andoperatorsfields/Terms,fields,andoperatorsoperators/Terms,fields,andoperatorsboosts/Boostswildcardcharacters/Wildcardsfuzzyquery/Fuzzyproximity/Proximityrangesearches/Ranges

Solrschemaabout/TheSolrschemafieldtypes/Fieldtypesfields/Fields

SolrServerabout/SolrServer–theSolrfaçadeEmbeddedSolrServer/SolrServer–theSolrfaçadeHttpSolrServer/SolrServer–theSolrfaçadeLBHttpSolrServer/SolrServer–theSolrfaçadeConcurrentUpdateSolrServer/SolrServer–theSolrfaçadeCloudSolrServer/SolrServer–theSolrfaçade

SolrsourcerepositoryURL/PhysicalandJVMmemory

sortfieldsabout/String

Spatialfilterqueryparser/OtheravailableparsersSQLEntityProcessor

www.it-ebooks.info

Page 335: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

about/Entityprocessorsstandaloneinstance,ofSolr

about/StandaloneinstancestandaloneSolrinstance

installing/InstallingastandaloneSolrinstanceprerequisites/Prerequisites

standardhighlighter/Standardhighlighterstandardrequesthandler

about/Standardrequesthandlersearchcomponents/Searchcomponentsqueryparameters/Queryparameters

staticfields/Staticfieldsstoredvalue,offields

modifying/Changingthestoredvalueoffieldsstringtype

about/Stringindexedfields/Stringsortfields/String

sub-entities/Documents,entities,andfieldssubversion

about/VersioncontrolSurroundqueryparser/Otheravailableparsers

www.it-ebooks.info

Page 336: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Ttechnicalexternaldocumentation/Documentationtechnicalinternaldocumentation/Documentationterm

about/ThetextanalysisprocessTermqueryparser/Otheravailableparserstext

about/Texttextanalysisprocess

about/Thetextanalysisprocesspositionincrement/Thetextanalysisprocessstartandendoffset/Thetextanalysisprocesspayload/Thetextanalysisprocess

threaddumpandJavaproperties/Javapropertiesandthreaddump

thresholds,fortriggeringauto-commitsmaxDocs/UpdatehandlerandautocommitfeaturemaxTime/Updatehandlerandautocommitfeature

tieparameter/Tiebreakertokenfilters

about/Tokenfiltersreferencelink/Tokenfilters

tokenizerabout/Tokenizersreferencelink/Tokenizers

transformerabout/Transformers

transformersURL/Fieldlists

troubleshootingabout/Troubleshooting,TroubleshootingUnsupportedClassVersionErrorerror/UnsupportedClassVersionErrorfailedtoreadartifactdescriptor/The“Failedtoreadartifactdescriptor”messagemultivaluedfields/MultivaluedfieldsandthecopyFielddirectivecopyFielddirective/MultivaluedfieldsandthecopyFielddirective,RequiredfieldsandthecopyFielddirectivecopyFieldinputvalue/ThecopyFieldinputvaluerequiredfields/RequiredfieldsandthecopyFielddirectivestoredtext,immutable/Storedtextisimmutable!datanotindexed/Datanotindexed

troubleshooting,Solrabout/Troubleshooting,Noscoreisreturnedinresponse

www.it-ebooks.info

Page 337: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

UUIMAMetadataExtractionLibrary/UIMAMetadataExtractionLibraryuniquekey/UniquekeyUnsupportedClassVersionErrorerror/UnsupportedClassVersionErrorupdatehandler/Updatehandlerandautocommitfeatureupdatehandlers

about/Updatehandlerscommitsattribute/UpdatehandlersautocommitmaxTimeattribute/Updatehandlersautocommitsattribute/Updatehandlerssoftautocommitsattribute/Updatehandlersoptimizesattribute/Updatehandlersrollbacksattribute/UpdatehandlersexpungeDeletesattribute/UpdatehandlersdocsPendingattribute/Updatehandlersaddsattribute/UpdatehandlersdeletesByIdattribute/UpdatehandlersdeletesByQueryattribute/Updatehandlerserrorsattribute/Updatehandlerscumulative_adds/Updatehandlerscumulative_deletesById/Updatehandlerscumulative_deletesByQuery/Updatehandlerscumulative_errors/Updatehandlers

UpdateRequestProcessor/UpdateRequestProcessoruserdocumentation/Documentationuserguide/Documentation

www.it-ebooks.info

Page 338: 2.droppdf.com2.droppdf.com/files/qRTGK/apache-solr-essentials.pdf · Table of Contents Apache Solr Essentials Credits About the Author Acknowledgments About the Reviewers Support

Wwildcardcharacters/Wildcards

www.it-ebooks.info