2.droppdf.com2.droppdf.com/files/qrtgk/apache-solr-essentials.pdf · table of contents apache solr...
Post on 15-Oct-2020
3 Views
Preview:
TRANSCRIPT
TableofContents
ApacheSolrEssentials
Credits
AbouttheAuthor
Acknowledgments
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmore
Whysubscribe?
FreeaccessforPacktaccountholders
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Errata
Piracy
Questions
1.GetMeUpandRunning
InstallingastandaloneSolrinstance
Prerequisites
Downloadingtherightversion
Settingupandrunningtheserver
SettingupaSolrdevelopmentenvironment
Prerequisites
Importingthesampleprojectofthischapter
Understandingtheprojectstructure
www.it-ebooks.info
DifferentwaystorunSolr
Backgroundserver
Integrationtestserver
Whatdowehaveinstalled?
Solrhome
solr.xml
schema.xml
solrconfig.xml
Otherresources
Troubleshooting
UnsupportedClassVersionError
The“Failedtoreadartifactdescriptor”message
Summary
2.IndexingYourData
UnderstandingtheSolrdatamodel
Thedocument
Theinvertedindex
TheSolrcore
TheSolrschema
Fieldtypes
Thetextanalysisprocess
Charfilters
Tokenizers
Tokenfilters
Puttingitalltogether
Someexamplefieldtypes
String
Numbers
Boolean
Date
Text
www.it-ebooks.info
Othertypes
Fields
Staticfields
Dynamicfields
Copyfields
Otherschemasections
Uniquekey
Defaultsimilarity
Solrindexingconfiguration
Generalsettings
Indexconfiguration
Updatehandlerandautocommitfeature
RequestHandler
UpdateRequestProcessor
Indexoperations
Add
Sendingaddcommands
Delete
Commit,optimize,androllback
Extendingandcustomizingtheindexprocess
Changingthestoredvalueoffields
Indexingcustomdata
Troubleshooting
MultivaluedfieldsandthecopyFielddirective
ThecopyFieldinputvalue
RequiredfieldsandthecopyFielddirective
Storedtextisimmutable!
Datanotindexed
Summary
3.SearchingYourData
Thesampleproject
www.it-ebooks.info
Querying
Search-relatedconfiguration
Queryanalyzers
Commonqueryparameters
Fieldlists
Filterqueries
Queryparsers
TheSolrqueryparser
Terms,fields,andoperators
Boosts
Wildcards
Fuzzy
Proximity
Ranges
TheDisjunctionMaximumqueryparser
QueryFields
Alternativequery
Minimumshouldmatch
Phrasefields
Queryphraseslop
Phraseslop
Boostqueries
Additiveboostfunctions
Tiebreaker
TheExtendedDisjunctionMaximumqueryparser
Fieldedsearch
Phrasebigramandtrigramfields
Phrasebigramandtrigramslop
Multiplicativeboostfunction
Userfields
Lowercaseoperators
www.it-ebooks.info
Otheravailableparsers
Searchcomponents
Query
Facet
Facetqueries
Facetfields
Facetranges
Pivotfacets
Intervalfacets
Highlighting
Standardhighlighter
Fastvectorhighlighter
Postingshighlighter
Morelikethis
Othercomponents
Searchhandler
Standardrequesthandler
Searchcomponents
Queryparameters
RealTimeGetHandler
Responseoutputwriters
ExtendingSolr
Mixingreal-timeandindexeddata
Usingacustomresponsewriter
Troubleshooting
Queriesdon’tmatchexpecteddocuments
Mismatchbetweenindexandqueryanalyzer
Noscoreisreturnedinresponse
Summary
4.ClientAPI
Solrj
www.it-ebooks.info
SolrServer–theSolrfaçade
Inputandoutputdatatransferobjects
Addsanddeletes
Search
Otherbindings
Summary
5.AdministeringandTuningSolr
Dashboard
PhysicalandJVMmemory
Diskusage
Filedescriptors
Logging
CoreAdmin
Javapropertiesandthreaddump
Coreoverview
Caches
Cachelifecycles
Cachesizing
Cachedobjectlifecycle
Cachestats
Typesofcache
Filtercache
QueryResultcache
Documentcache
Fieldvaluecache
Customcache
Queryhandlers
Updatehandlers
JMX
Summary
6.DeploymentScenarios
www.it-ebooks.info
Standaloneinstance
Shards
Master/slavesscenario
Shardswithreplication
SolrCloud
Clustermanagement
Replicationfactor,leaders,andreplicas
Durabilityandrecovery
Thenewterminology
Administrationconsole
CollectionsAPI
Distributedsearch
Cluster-awareindex
Summary
7.SolrExtensions
DataImportHandler
Datasources
Documents,entities,andfields
Transformers
Entityprocessors
Eventlisteners
ContentExtractionLibrary
LanguageIdentifier
RapidprototypingwithSolaritas
Otherextensions
Clustering
UIMAMetadataExtractionLibrary
MapReduce
Summary
8.ContributingtoSolr
Identifyingyourneeds
www.it-ebooks.info
Anexample–SOLR-3191
Subscribingtomailinglists
SigninguponJIRA
Settingupthedevelopmentenvironment
Versioncontrol
Codestyle
Checkingoutthecode
CreatingtheprojectinyourIDE
Makingyourchanges
Creatingandsubmittingapatch
Otherwaystocontribute
Documentation
Mailinglistmoderator
Summary
Index
www.it-ebooks.info
ApacheSolrEssentialsCopyright©2015PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:February2015
Productionreference:1210215
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78439-964-1
www.packtpub.com
www.it-ebooks.info
CreditsAuthor
AndreaGazzarini
Reviewers
AhmadMaherAbdelwhab
MarkusKlose
JulianLam
PuneetSinghLudu
CommissioningEditor
UshaIyer
AcquisitionEditor
LarissaPinto
ContentDevelopmentEditor
KirtiPatil
TechnicalEditor
AnkurGhiye
CopyEditor
VikrantPhadke
ProjectCoordinator
NidhiJ.Joshi
Proofreaders
StephenCopestake
MariaGould
BernadetteWatkins
Indexer
PriyaSane
Graphics
AbhinashSahu
ProductionCoordinator
ShantanuN.Zagade
CoverWork
www.it-ebooks.info
AbouttheAuthorAndreaGazzariniisasoftwareengineer.HehasmainlyfocusedontheJavatechnology.Althoughofteninvolvedinanalysisanddesign,hestronglylovescodinganddefinitelylikestobeconsideredadeveloper.
Andreahasmorethan15yearsofexperienceinvarioussoftwarebranches,fromtelecomtobankingsoftware.Hehasworkedforseveralmedium-andlarge-scalecompanies,suchasIBMandOrgaSystems.
AndreahasseveralcertificationsintheJavaprogramminglanguage(programmer,developer,webcomponentdeveloper,businesscomponentdeveloper,andJEEarchitect),BEAproducts(buildandportalsolutions),andApacheSolr(LucidApacheSolr/LuceneCertifiedDeveloper).
In2009,Andreasteppedintothewonderfulworldofopensourceprojects,andinthesameyear,hebecameacommitterfortheApacheQpidproject.HisadventurewithSolrbeganin2010,whenhejoined@Cult,anItaliancompanythatmainlyfocusesitsprojectsonlibrarymanagementsystems,onlineaccesspubliccatalogs,andlinkeddata.
He’scurrentlyinvolvedinseveral(toomany!)projects,alwaysthinkingabouta“big”ideathatwillchangehis(developer)life.
www.it-ebooks.info
AcknowledgmentsI’dliketobeginbythankingthepeoplewhomadethisbookwhatitis.Writingabookisnotasingleperson’swork,andhelpfromexperiencedpeoplethatguideyoualongthepathiscrucial.ManythankstoLarissa,Kirti,Ankur,andVikrantforsupportingmeinthisprocess.
Iamalsogratefultothetechnicalreviewersofthebook,AhmadMaherAbdelwhab,MarkusKlose,PuneetSinghLudu,andJulianLam,forcarefullyreadingmydraftsandspotting(hopefully)mostofmymistakes.Thisbookwouldnothavebeensogoodwithouttheirhelpandinput.
Ingeneral,Iwanttothankeveryonewhodirectlyorindirectlyhelpedmeincreatingthisbook,exceptforalong-sightedteacherwhooncetoldmewhenIwasinuniversity,“Hey,guywithallthoseearrings!Youwon’tgoanywhere!”
Finally,aspecialthoughttomyfamily;tomygirls,theactualsupportersofthebook;mywonderfulwife,Nicoletta(towhomIpromisenottowriteanotherbook),myprideandjoy,SofiaandCaterina,andmyfirstactualteacher—mymom,Lina.TheyarethepeoplewhoreallymadesacrificeswhileIwaswritingandwhodefinitelydeservethecreditsforthebook.
Onceagain,thankyou!
www.it-ebooks.info
AbouttheReviewersAhmadMaherAbdelwhabiscurrentlyworkingatKnowledgewareTechnologiesasanopensourcedeveloper.Hehasover10yearsofexperience,withspecialdevelopmentskillsinPHP,Drupal,Perl,RubyOnRails,Java,XML,XSL,MySQL,PostgreSQL,MongoDB,SQL,andLinux.HegraduatedincomputersciencefromMansouraUniversityin2005.
Iwouldliketothankmyfather,mother,andsincerewifefortheircontinuoussupportwhilereviewingthisbook.
MarkusKloseisasearchandbigdataconsultantatSHIGmbH&Co.KGinGermany.Heisinchargeofprojectmanagementandsupervision,projectanalysis,anddeliveringconsultingandtrainingservices.
MostofMarkus’dailybusinessisrelatedtoApacheSolr,Elasticsearch,andFastESP.HetravelsacrossGermany,Switzerland,andAustriatoprovidehisservicesandknowledge.
Onaregularbasis,youcanfindhimatmeets,usergroups,orconferencessuchasBerlinBuzzwordoderSolrRevolution,wherehespeaksaboutApacheSolr.
Besidessearch-relatedtrainingandconsulting,heiscurrentlyestablishingadditionalareasofwork.HeusestoolssuchasLogstashandKibanatofulfillcustomerrequirementsinmonitoringandanalytics.
Thankstotheexperiencegainedfromhisdailywork,MarkuswrotethefirstGermanbookonApacheSolr(EinführunginApacheSolr)withhiscolleague,DanielWrigley.ItwaspublishedbyO’ReillyinFebruary2014.
Besideswriting,MarkusspendsalotofhisfreetimeusinghisknowledgeandprogrammingskillstoworkonandcontributetoopensourceprojectssuchasLatinstemmerandnumberconverterforSolr(https://issues.apache.org/jira/browse/LUCENE-4229)andSolrAppenderforlog4j2(https://issues.apache.org/jira/browse/LOG4J2-618).
JulianLamisacofounderandcoremaintainerofNodeBB,atypeoffreeandopensourceforumsoftwarebuiltuponmodernwebtools,suchasNode.jsandRedis.HehasspokenseveraltimesontopicsrelatedtoJavascriptintheworkplaceandbestpracticesforhiring.Julianisanadvocateofclient-siderendering,whichcanbeusedtobuildhighlyperformantwebapplications.
www.it-ebooks.info
Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.
DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<service@packtpub.com>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.
www.it-ebooks.info
Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
www.it-ebooks.info
FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.
HiDad,whenyouboughtmemyfirstcomputer,youhadnoideawhatwascomingnext…
www.it-ebooks.info
PrefaceAsyoumayhaveguessedfromthetitle,thisisabookaboutApacheSolr—specificallyaboutSolressentials.WhatdoImeanbyessentials?Nicequestion!Suchatermcanbeseenfromsomanyperspectives.Solr,mainlyfrom2010onwards,witnessedexponentialgrowthintermsofpopularity,stakeholders,community,andthecapabilitiesitoffers.Thisrapidgrowthreflectstherichportfolioofthethingsthathavebeendevelopedintheseyearsandarenowadaysavailable.So,strictlyspeaking,it’snotsoeasytodefinethe“essentials”ofSolr.
TheperspectivethatIwillusetoexplaintheterm“essentials”isquitesimpleandpragmatic.IwilldescribethebuildingblocksofApacheSolr,andatthesametime,Iwilltrytoputmypersonalexperienceonthosetopics.Inrecentyears,I’veworkedwithSolrinseveralprojects.Asauser,Ihadtolearnhowtoinstall,configure,tune,troubleshoot,andmonitorSolr.Asadeveloper,thingsweredifferentforme.Ifyou’reworkingintheITdomainandyou’rereadingthisbook(Iguessyouare),youprobablyknowthateachtimeyoutrytoimplementasolution,there’ssomethingintheprojectthataspecifictooldoesn’tcover.So,afterspendingalotoftimeanalyzing,readingdocumentation,searchingontheInternet,readingWikis,andsoon,yourealizethatyouneedtoaddacustompieceofcodesomewhere.That’sbecause“theproductcoversthe99.9999percentofthepossiblescenariosbut…”Forthisspecificcase,ifthishappensorthathappens,youalwaysfallunderthat0.0001percent.Idon’tknowaboutyou,butforme,thishasalwaysbeenso.Nomatterwhattheproject,thecompany,ortheteamis,thishasbeenanimplicitconstantofeveryproject,always.
That’sthereasonIwilltryasmuchaspossibletoexplainthingsthroughoutthebookusingreal-worldexamplesdirectlycomingfrommypersonalexperience.Ihopethisadditionalperspectivewillbeusefulforbetterunderstandingofwhatisconsideredthemostpopularopensourcesearchplatform.
www.it-ebooks.info
WhatthisbookcoversChapter1,GetMeUpandRunning,introducesthebasicconceptsofSolranditprovidesyouwithallthenecessarystepstoquicklygetitupandrunning.
Chapter2,IndexingYourData,beginsourfirstdetaileddiscussiononSolr.Inthischapter,welookatthedataindexingprocessandseehowitcanbeconfigured,tuned,andcustomized.Thisisalsowhereweencounterthefirstlineofcode.
Chapter3,SearchingYourData,explorestheotherspecularsideofSolr.First,westoredourdata;nowweexploreallthatSolroffersintermsofsearchservices.
Chapter4,ClientAPI,coversclient-sideusageofSolrlibraries,providingadescriptionofthemainusecasesfromaclient’sperspective.
Chapter5,AdministeringandTuningSolr,takesyouthroughtheavailabletoolsforconfiguring,managing,andtuningSolr.
Chapter6,DeploymentScenarios,illustratesthevariouswaysinwhichyoucandeploySolr,fromastandaloneinstancetoadistributedcluster.
Chapter7,SolrExtensions,describesseveralavailableSolrextensionsandhowtheycanbeusefulinsolvingcommonconcreteusecases.
Chapter8,ContributingtoSolr,explainsthewonderfulworldofopensourcesoftwarebyillustratingthecompoundingpiecesoftheprocessofparticipationandcontribution.
www.it-ebooks.info
WhatyouneedforthisbookInordertobeabletorunthecodeexamplesinthebook,youwillneedtheJavaDevelopmentKit(JDK)1.7andApacheMaven.
Alternatively,youwillneedanIntegratedDevelopmentEnvironment(IDE).EclipseisstronglyrecommendedasitisthesameenvironmentIusedtocapturethescreenshots.However,evenifyouwanttouseanotherIDE,thestepsshouldbequitesimilar.
Thedifferencebetweenthetwoalternativesmainlyresidesintherolethatyouwanttoassumeduringthereading.Whileyoumaywanttoonlystartandexecutetheexamplesasauser,youwouldsurelywanttoseetheworkingcodeinausableenvironmentasadeveloper.That’sthereasonanIDEisstronglyrecommendedinthesecondcase.
Thefirstchapterwillprovidetheinstructionsnecessaryforinstallingallthatyou’llneedthroughthebook.
www.it-ebooks.info
WhothisbookisforThisbookistargetedatpeople—usersanddevelopers—whoarenewtoApacheSolrorareexperiencedwithasimilarproduct.ThebookwillgraduallyhelpyoutounderstandthefocalconceptsofSolrwiththehelpofpracticaltipsandreal-worldusecases.Althoughalltheexamplesassociatedwiththebookcanbeexecutedwithafewsimplecommands,afamiliaritywiththeJavaprogramminglanguageisrequiredforagoodunderstanding.
www.it-ebooks.info
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandexplanationsoftheirmeanings.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Eachfolderhasasubfoldercalledconfwheretheconfigurationforthatspecificcoreresides.”
Ablockofcodeissetasfollows:
{
{"id":1,"title":"TheBirthdayConcert"},
{"id":2,"title":"LiveinItaly"},
{"id":3,"title":"LiveinPaderborn"},
}
Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:
<filterclass="solr.LowerCaseFilterFactory"/>
<filterclass="solr.StopFilterFactory"words="stopwords.txt"
ignoreCase="true"/>
Anycommand-lineinputoroutputiswrittenasfollows:
#mvncargo:run–PfieldAnalysis
Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“Chooseafieldtypeorafield.ThenpresstheAnalyseValuesbutton.”
NoteWarningsorimportantnotesappearinaboxlikethis.
TipTipsandtricksappearlikethis.
www.it-ebooks.info
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.
Tosendusgeneralfeedback,simplye-mail<feedback@packtpub.com>,andmentionthebook’stitleinthesubjectofyourmessage.
Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
www.it-ebooks.info
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
www.it-ebooks.info
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Alternatively,youcanalsodownloadtheexamplesfromGitHub,onhttps://github.com/agazzarini/apache-solr-essentials.There,youcandownloadthewholecontentasazipfilefromhttps://github.com/agazzarini/apache-solr-essentials/archive/master.zipor,ifyouhavegitinstalledonyourmachine,youcanclonetherepositorybyissuingthefollowingcommand:
#gitclone
https://github.com/agazzarini/apache-solr-essentials.git
<path-to-your-work-dir>
Where,<path-to-your-work-dir>isthedestinationfolderwheretheprojectwillbecloned.
www.it-ebooks.info
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
www.it-ebooks.info
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusat<copyright@packtpub.com>withalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
www.it-ebooks.info
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<questions@packtpub.com>,andwewilldoourbesttoaddresstheproblem.
www.it-ebooks.info
Chapter1.GetMeUpandRunningThischapterdescribeshowtoinstallSolrandfocusesonalltherequiredstepstogetacompletestudyanddevelopmentenvironmentthatwillguideusthroughthebook.
Specifically,accordingtothedoubleperspectivepreviouslydescribed,Iwillillustratetwokindsofinstallations.ThefirstistheinstallationofastandaloneSolrinstance(thisisveryquick).Thisisasimpletaskbecausethedownloadbundleispreconfiguredwithallthatyouneedtogetyourfirsttasteoftheproduct.Asadeveloper,thesecondperspectiveiswhatIreallyneedeverydayinmyordinaryjob—aworkingintegrateddevelopmentenvironmentwhereIcanrunanddebugSolrwithmyconfigurationsandcustomizations,withouthavingtomanageanexternalserver.Ingeneral,suchanenvironmentwillhaveallthatIneedinoneplacefordeveloping,debugging,andrunningunitandintegrationtests.
Bytheendofthechapter,youwillhavearunningSolrinstanceonyourmachine,aready-to-useIntegratedDevelopmentEnvironment(IDE),andagoodunderstandingofsomebasicconcepts.
Thischapterwillcoverthefollowingtopics:
Installationofasimple,standaloneSolrinstancefromscratchSettingupofanIntegratedDevelopmentEnvironmentAquickoverviewaboutwhatweinstalledTroubleshooting
www.it-ebooks.info
InstallingastandaloneSolrinstanceSolrisavailablefordownloadasanarchivethat,onceuncompressed,containsafullyworkinginstancewithinaJettyservletengine.Sothestepshereshouldbeprettyeasy.
www.it-ebooks.info
PrerequisitesInthissection,wewilldescribeacoupleofprerequisitesforthemachinewhereSolrneedstobeinstalled.
Firstofall,Java6or7isrequired:theexactchoicedependsonwhichversionofSolryouwanttoinstall.Ingeneral,regardlessoftheversion,makesureyouhavethelatestupdateofyourJavaVirtualMachine(JVM).ThefollowingtabledescribestheassociationbetweenthelatestSolrandJavaversions:
Solrversion Javaversion
4.7.x Java6orgreater
4.8.x Java7(update55)orgreater;Java8isverifiedtobecompatible
4.9.x Java7(update55)orgreater;Java8isverifiedtobecompatible
4.10.x Java7(update55)orgreater
Javacanbedownloadedfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.
OtherfactorssuchasCPU,RAM,anddiskspacestronglydependonwhatyouaregoingtodowiththisSolrinstallation.Nowadays,itshouldn’tbehardtohaveacoupleofGBavailableonyourworkstation.However,bearinmindthatatthismomentI’mplayingonSolr4.9.0installedonaRaspberryPI(itsRAMis512MB).IgaveSolramaximumheap(-Xmx)of256MB,indexedabout500documents,andexecutedsomequerieswithoutanyproblem.Butagain,thosefactorsreallydependonwhatyouwanttodo:wecouldsaythat,assumingyou’reusingamodernPCforastudyinstance,hardwareresourcesshouldn’tbeaproblem.
Instead,ifyouareplanningaSolrinstallationinatestorinaproductionenvironment,youcanfindausefulspreadsheetathttps://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls.
Althoughitcannotencompassallthepeculiaritiesofyourenvironment,itisdefinitelyagoodstartingpointforRAManddiskspaceestimation.
www.it-ebooks.info
DownloadingtherightversionThelatestversionofSolratthetimeofwritingis4.10.3,butalotofthingswewilldiscussinthebookarevalidforpreviousversionsaswell.
YoumightalreadyhaveSolrsomewhereandmightnotwanttoredownloadanotherinstance,yourcustomermightalreadyhaveapreviousversion,or,ingeneral,youmightnotwantthelatestversion.Therefore,Iwilltrytorefertoseveralversionsinthebook—from4.7.xto4.10.x—asoftenaspossible.Eachtimeafeatureisdescribed,Iwillindicatetheversionwhereitappearedfirst.
Thedownloadbundleisusuallyavailableasatgzorziparchive.Youcanfindthatathttps://lucene.apache.org/solr/downloads.html.
www.it-ebooks.info
SettingupandrunningtheserverOncetheSolrbundlehasbeendownloaded,extractitinafolder.Wewillrefertothatfolderas$INSTALL_DIR.TypethefollowingcommandtoextracttheSolrbundle:
#tar-xvf$DOWNLOAD_DIR/solr-x.y.z.tar.gz-C$INSTALL_DIR
or
#unzip$DOWNLOAD_DIR/solr-x.y.z.zip-d$INSTALL_DIR
dependingontheformatofthebundle.
Attheend,youwillfindanewsolr-x.y.zfolderinyour$INSTALL_DIRfolder.ThisfolderwillactasacontainerforallSolrinstancesyoumaywanttoplaywith.Hereisascreenshotofthesolr-x.y.zfolderonmymachine,whereyoucanseeIhavethreeSolrversions:
Thesolr-x.y.zdirectorycontainsJetty,afastandsmallservletengine,withSolralreadydeployedinside.So,inordertostartSolr,weneedtostartJetty.Openanewshellandtype
www.it-ebooks.info
thefollowingcommands:
#cd$INSTALL_DIR/solr-x.y.z/example
#java-jarstart.jar
Youshouldseealotoflogmessagesendingwithsomethinglikethis:
...
[INFO]org.eclipse.jetty.server.AbstractConnector–Started
SocketConnector@0.0.0.0:8983
...
[INFO]org.apache.solr.core.SolrCore–[collection1]Registerednew
searcherSearcher@66b664d7[collection1]
main{StandardDirectoryReader(segments_2:3:nrt_0(4.9):C32)}
ThesemessagestellyouSolrisup-and-running!Openawebbrowserandtypehttp://127.0.0.1:8983/solr.
Youshouldseethefollowingpage:
ThisistheSolradministrationconsole.
www.it-ebooks.info
SettingupaSolrdevelopmentenvironmentThissectionwillguideyouthroughthenecessarystepstohaveaworkingdevelopmentenvironmentthatallowsyoutohaveaplacetowriteandexecuteyourcodeorconfigurationsagainstarunninganddebuggableSolrinstance.
Ifyouaren’tinterestedinsuchaperspectivebecause,forinstance,yourusagescenariofallswithintheprevioussection,youcansafelyskipthisandproceedwiththenextsection.
Thesourcecodeincludedwiththisbookcontainsaready-to-useprojectforthissection.Iwilllaterexplainhowtogetitintoyourworkspaceinoneshot.
www.it-ebooks.info
PrerequisitesThedevelopmentworkstationneedstohavesomesoftware.Asyoucansee,Ikeptthelistsmallandminimal.
Firstly,youneedtheJavaDevelopmentKit7(JDK),ofwhichIrecommendthelatestupdate,althoughtheolderversionofSolrcoveredbythisbook(4.7.x)isabletorunwithJava6.Java7issupportedfrom4.7.xto4.10.x,soitisdefinitelyarecommendedchoice.
Lastly,weneedanIDE.Specifically,IwilluseEclipsetoillustrateanddescribethedeveloperperspective,soyoushoulddownloadarecentJSEversion(thatis,EclipseIDEforJavaDevelopers)fromhttps://www.eclipse.org/downloads.
NoteDonotdownloadtheEEversionofEclipsebecauseitcontainsalotofthingswedon’tneedinthisbook.
StartingfromEclipseJuno,alltherequiredpluginsarealreadyincluded.However,ifyouloveanolderversionofEclipse(suchasIndigo)likeIdo,thenMavenintegrationforEclipse—alsoknownasM2Eclipse(M2E)—needstobeinstalled.YoucanfindthisintheEclipsemarketplace(gotoHelp|EclipseMarketplace,thensearchform2e,andclickontheInstallbutton).
www.it-ebooks.info
ImportingthesampleprojectofthischapterIt’stimetoseesomecode,inordertotouchthingswithyourhands.WewillguideyouthroughthenecessarystepstohaveyourEclipseconfiguredwithasampleproject,whereyouwillbeabletostart,stop,anddebugSolrwithyourcode.
First,youhavetoimporttoEclipsethesampleprojectinyourlocalch1folder.Iassumeyoualreadygotthesourcecodefromthepublisher’swebsiteorfromGithub,asdescribedinthePreface.OpenEclipse,createanewworkspace,andgotoFile|Import|Maven|ExistingMavenProjects.
TipDownloadingtheexamplecode
Youcandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Alternatively,youcanalsodownloadtheexamplesfromGitHub,onhttps://github.com/agazzarini/apache-solr-essentials.There,youcandownloadthewholecontentasazipfilefromhttps://github.com/agazzarini/apache-solr-essentials/archive/master.zipor,ifyouhavegitinstalledonyourmachine,youcanclonetherepositorybyissuingthefollowingcommand:
#gitclonehttps://github.com/agazzarini/apache-solr-essentials.git<path-
to-your-work-dir>
Where<path-to-your-work-dir>isthedestinationfolderwheretheprojectwillbecloned.
Inthedialogboxthatappears,selectthech1folderandclickontheFinishbutton.EclipsewilldetecttheMavenlayoutofthatfolderandwillcreateanewprojectonyourworkspace,asillustratedinthefollowingscreenshot(ProjectExplorerview):
www.it-ebooks.info
UnderstandingtheprojectstructureTheprojectyou’veimportedisverysimpleandcontainsjustfewlinesofcode,butitisusefulforintroducingsomecommonconceptsthatwillguideusthroughthebook(theotherchaptersuseexampleswithasimilarstructure).
Thefollowingtableshowsthestructureoftheproject:
FolderorFile Description
src/main/java
Themainsourcefolder.Itisemptyatthemoment,butitwillcontaintheSolrextensions(anddependentclasses)youwanttoimplement.Youwon’tfindthisdirectoryinthisfirstprojectbecausewedon’thavethesourcefilesyet.
src/main/resourcesThiscontainsprojectresourcessuchaspropertiesandconfigurationfiles.Youwon’tfindthisdirectoryinthisfirstprojectbecausewedon’thaveanyresourcesyet.
src/test/javaThissourcefoldercontainsUnitandIntegrationtests.Forthisfirstproject,youwillfindasingleintegrationtesthere.
src/test/resourcesThiscontainstestresourcessuchaspropertiesandconfigurationfiles.Itincludesasampleloggingconfiguration(log4j.xml).
src/dev/eclipse PreconfiguredEclipselaunchersusedtorunSolrandtheexamplesintheproject.
src/solr-home ThiscontainstheSolrconfigurationfiles.Wewilldescribethecontentofthisdirectorylater.
pom.xmlThisistheMavenProjectdefinition.Here,youcanconfigureanyfeatureofyourproject,includingdependencies,properties,andsoon.
WithintheMavenprojectdefinition(thatis,pom.xml),youcandoalotofthings.Forourpurposesrightnow,itisimportanttounderlinethepluginsection,whereyoucanseetheMavenCargoPlugin(http://cargo.codehaus.org/Maven2+plugin)configuredtorunanembeddedJetty7containeranddeploySolr.Here’sascreenshotthatshowstheCargoPluginconfigurationsection:
www.it-ebooks.info
IfyouhavetheBuildautomaticallyflagset(thedefaultbehaviorinEclipse),mostprobablyEclipsehasalreadydownloadedalltherequireddependencies.ThisisoneofthegreatthingsaboutApacheMaven.
So,assumingthatyouhavenoerrors,it’snowtimetostartSolr.ButwhereisSolr?
Thefirstquestionthatprobablycomestomindis:“Ididn’tdownloadSolr!Whereisit?”TheanswerisstillApacheMaven,whichisdefinitelyagreatopensourcetoolforsoftwaremanagementandsomethingthatsimplifiesyourlife.
MavenisalreadyincludedinyourEclipse(bymeansofthem2eplugin),andtheprojectyoupreviouslyimportedisafullycompliantMavenproject.
Sodon’tworry!WhenwestartaMavenbuild,Solrwillbedownloadedautomatically.Butwhere?InyourlocalMavenrepository,andyoudon’tneedtoconcernyourselfwiththat.
NoteWithinthepom.xmlfile,youwillfindaproperty,<solr.version>,withaspecificvalue.Ifyouwanttouseadifferentversion,justchangethevalueofthisproperty.
www.it-ebooks.info
DifferentwaystorunSolrIt’stimetostartSolrinyourIDEforthefirsttimebut,priortothat,it’simportanttodistinguishthetwowaystorunSolr:
Backgroundserver:Asabackgroundserver,sothatyoucanstartandstopSolrfordebuggingpurposesIntegrationtestserver:AsanintegrationtestserversothatyoucanhaveadedicatedSolrinstancetorunyourintegrationtestssuite
BackgroundserverThefirstthingyouwillneedinyourIDEisaserverinstancethatyoucanstart,stop,and(ingeneral)managewithafewsimplecommands.
Inthisway,youwillbeabletohaveSolrrunningwithyourconfigurations.Youcanindexyourdataandexecutequeriesinorderto(manually)ensurethatthingsareworkingasexpected.
Togetthistypeofserver,followtheseinstructions:
1. Right-clickontheprojectandcreateanewMaven(Debug)launchconfiguration(DebugAs|Mavenbuild…).
2. Inthedialog,typecargo:runintheGoalstextfield.3. Next,clickontheDebugbuttonasshowninthefollowingscreenshot:
Theveryfirsttimeyourunthiscommand,Mavenwilldownloadalltherequireddependenciesandplugins,includingSolr.Attheend,itwillstartanembeddedJettyinstance.
www.it-ebooks.info
NoteWhyaDebuginsteadofaRunconfiguration?
YoumustuseaDebugconfigurationsothatyouwillbeabletostoptheserverbysimplypressingtheredbuttonontheEclipseconsole.Runconfigurationshaveanannoyinghabit:Eclipsewillsaytheprocessisstopped,butJettywillbestillrunning,oftenleavinganorphanprocess.
YoushouldseethefollowingoutputintheEclipseconsole:
[INFO]------------------------------------------------------------
[INFO]BuildingChapter1Project1.0
[INFO]----------------------------------------------------------
Downloading:http://repo1.maven.org/maven2/org/apache/solr/solr/4.9.0/solr-
4.9.0.war
Downloaded:http://repo1.maven.org/maven2/org/apache/solr/solr/4.8.0/solr-
4.9.0.war(28585KBat432.5KB/sec)
...
[INFO]Jetty7.6.15.v20140411Embeddedstartedonport[8983]
ThismeansthatSolrisupandrunninganditislisteningonport8983.Nowopenyourwebbrowserandtypehttp://127.0.0.1:8983/solr.YoushouldseetheSolradministrationconsole.
TipIntheproject,andspecificallyinthesrc/dev/eclipsefolder,therearesomeuseful,ready-to-useEclipselaunchers.Insteadoffollowingthemanualstepsillustratedpreviously,justright-clickonthestart-embedded-solr.launchfileandgotoDebugAs|run-ch1-example-server.launch.
IntegrationtestserverAnotherimportantthingyoucould(orshould,inmyopinion)doinyourprojectistohaveanintegrationtestsuite.Integrationtestsareclassesthat,asthenamesuggests,runverificationsagainstarunningserver.
Whenyou’reworkingonaprojectwithSolrandyouwanttoimplementanextension,asearchcomponent,oraplugin,youwillobviouslywanttoensurethatitisworkingproperly.Ifyou’rerunninganexternalSolrserver,youneedtopackyourclassesinajar,copythatbundlesomewhere(later,wewillseewhere),starttheserver,andexecuteyourchecks.
Therearealotofdrawbackswiththisapproach.Eachtimeyougetsomethingwrong,youneedtorepeatthewholeprocess:fix,pack,copy,restarttheserver,prepareyourdata,andrunthecheckagain.Also,youcannoteasilydebugyourclasses(orSolrclasses)duringthatiterativecheck.Allofthiswillmostprobablyendwithalotofstatementsinyourcodeasfollows:
System.out.println("BLABLABLA");
IsupposeyouknowwhatI’mtalkingabout.
www.it-ebooks.info
Thisiswhereintegrationtestsbecomeveryhelpful.YoucancodeyourchecksandyourassertionsasnormalJavaclasses,andhaveanautomatedtestsuitethatdoesthefollowingeachtimeitisexecuted:
StartsanembeddedSolrinstanceExecutesyourtestsagainstthatinstanceStopstheSolrinstanceProducesusefulreports
Theprojectwesetuppreviouslyhasthatcapabilityalready,andthere’saverybasicintegrationtestinthesrc/test/javafoldertosimplyaddandquerysomedata.
Inordertoruntheintegrationtestsuite,createanewMavenrunconfiguration(right-clickontheprojectandgotoRunAs|Mavenbuild…),and,inthedialogbox,typecleaninstallintheGoalstextfield:
AfterclickingontheRunbutton,youshouldseesomethinglikethis:
...
[INFO]Jetty7.6.15.v20140411Embeddedstarting…
...
[INFO]ReadingSolrSchemafromschema.xml
...
[INFO]Jetty7.6.15.v20140411Embeddedstartedonport[8983]
...
-------------------------------------------------------
TESTS
www.it-ebooks.info
-------------------------------------------------------
Runningorg.gazzax.labs.solr.ase.ch1.it.FirstQueryITCase
...
Results:
Testsrun:1,Failures:0,Errors:0,Skipped:0
TipAsbefore,underthesrc/dev/eclipsefolder,thereisalreadyapreconfiguredEclipselauncherforthisscenario.Right-clickonthestart-embedded-solr.launchfileandgotoDebugAs|run-the-example-as-integration-test.
FromtheEclipselog,youcanseethatatest(specifically,anintegrationtest)hasbeensuccessfullyexecuted.Youcanfindthesourcecodeofthattestintheprojectwecheckedoutbefore.ThenameoftheclassthatisreportedinthelogisFirstQueryITCase(ITstandsforIntegrationTest),anditisintheorg.gazzax.labs.solr.ase.ch1.itpackage.
TheFirstQueryITCase.javaclassdemonstratesabasicinteractionflowwecanhavewithSolr:
//Thisisthe(input)DataTransferObjectbetweenyourclientandSOLR.
finalSolrInputDocumentinput=newSolrInputDocument();
//1.Populateswith(atleastrequired)fields
input.setField("id",1);
input.setField("title","ApacheSOLREssentials");
input.setField("author","AndreaGazzarini");
input.setField("isbn","972-2-5A619-12A-X");
//2.Addsthedocument
client.add(input);
//3.Commitchanges
client.commit();
//4.Buildsanewqueryobjectwitha"selectall"query.
finalSolrQueryquery=newSolrQuery("*:*");
//5.Executesthequery
finalQueryResponseresponse=client.query(query);
//6.Getsthe(output)DataTransferObject.
finalSolrDocumentoutput=response.getResults().iterator().next();
finalStringid=(String)output.getFieldValue("id");
finalStringtitle=(String)output.getFieldValue("title");
finalStringauthor=(String)output.getFieldValue("author");
finalStringisbn=(String)output.getFieldValue("isbn");
//7.1IncasewearerunningasaJavaapplicationprintoutthequery
results.
System.out.println("Itworks!Ifoundthefollowingbook:");
System.out.println("--------------------------------------");
System.out.println("ID:"+id);
System.out.println("Title:"+title);
www.it-ebooks.info
System.out.println("Author:"+author);
System.out.println("ISBN:"+isbn);
//7.OtherwiseassertsthequeryresultsusingstandardJUnitprocedures.
assertEquals("1",id);
assertEquals("ApacheSOLREssentials",title);
assertEquals("AndreaGazzarini",author);
assertEquals("972-2-5A619-12A-X",isbn);
TipFirstQueryITCaseisanintegrationtestandamainclassatthesametime.Thismeansthatyoucanrunitinthreeways:asdescribedearlier,asamainclass,andasaJUnittest.Ifyoupreferthesecondorthethirdoption,remembertostartSolrbefore(usingtherun-ch1-example-server.launch).Youcanfindthelaunchersunderthesrc/dev/eclipsefolder.Justright-clickononeofthemandruntheexampleinonewayoranother.
www.it-ebooks.info
Whatdowehaveinstalled?Regardlessofthekindofinstallation,youshouldnowhaveaSolrinstanceupandrunning,soit’stimetohaveaquickoverviewofitsstructure.
SolrisastandardJEEwebapplication,packagedasa.wararchive.Ifyoudownloadedthebundlefromthewebsite,youcanfinditunderthewebappsfolderofJetty,usuallyunder:
$INSTALL_DIR/solr-x.y.z/example/webapps
Instead,ifyoufollowedthedeveloperway,Mavendownloadedthatwarfileforyou,anditisnowinyourlocalrepository(usuallyafoldercalled.m2underyourhomedirectory).
www.it-ebooks.info
SolrhomeInanycase,Solrhasbeeninstalledandyoudon’tneedtoconcernyourselfwithwhereitisphysicallylocated,mainlybecauseallthatyouhavetoprovidetoSolrmustresideinanexternalfolder,usuallyreferredtoastheSolrhome.
Inthedownloadbundle,there’sapreconfiguredSolrhomefolderthatcorrespondstothe$INSTALL_DIR/solr-x.y.z/example/solrfolder.WithinyourEclipseproject,youcanfindthatunderthesrcfolder;itiscalled(notsurprisingly)solr-home.
InaSolrhomefolder,youwilltypicallyfindafilecalledsolr.xml,andoneormorefoldersthatcorrespondtoyourSolrcores(wewillseewhatacoreis,inChapter2,IndexingYourData).Eachfolderhasasubfoldercalledconfwheretheconfigurationforthatspecificcoreresides.
www.it-ebooks.info
solr.xmlThefirstfileyouwillfindwithintheSolrhomedirectoryissolr.xml.Itdeclaressomeconfigurationparametersabouttheinstance.
Previously(inSolr4.4),youhadtodeclareallthecoresofyourinstanceinthisfile.Nowthere’samoreintelligentautodiscoverymechanismthathelpsyouavoidexplicitdeclarationsaboutthecoresthatarepartofyourconfiguration.
Inthedownloadbundle,youwillfindanexampleofaSolrhomewithonlyonecore:
$INSTALL_DIR/solr-x.y.z/example/solr
Thereisalsoanexamplewithtwocores:
$INSTALL_DIR/solr-x.y.z/example/multicore
Thisdirectoryisbuiltusingtheoldstylewementionedpreviously,withallthecoresexplicitlydeclared.IntheEclipseproject,youcanfindthesinglecoreexampleinadirectorycalledsolr-home.Themulticoreexampleisintheexample-solr-home-with-multicorefolder.
www.it-ebooks.info
schema.xmlAlthoughtheschema.xmlfilewillbedescribedindetaillater,itisimportanttobrieflymentionitbecausethisistheplacewhereyoucandeclarehowyourindex(ofaspecificcore)iscomposed,intermsoffields,types,andanalysis,bothatindextimeandquerytime.Inotherwords,thisistheschemaofyourindexand(mostprobably)thefirstthingyouhavetodesignaspartofyourSolrproject.
Inthedownloadbundleyoucanfindtheschema.xmlsampleunderthe$INSTALL_DIR/solr-x.y.z/example/solr/collection1/conffolder,whichishugeandfullofcomments.ItbasicallyillustratesallthepredefinedfieldsandtypesyoucanuseinSolr(youcancreateyourowntype,butthat’sdefinitelyanadvancedtopic).
Ifyouwanttoseesomethingsimplerfornow,theEclipseprojectunderthesolr-home/confdirectoryhasaverysimpleschema,withafewfieldsandonlyonefieldtype.
www.it-ebooks.info
solrconfig.xmlThesolrconfig.xmlfileiswheretheconfigurationofaSolrcoreisdefined.Itcancontainalotofdirectivesandsectionsbut,fortunatelyformostofthem,Solr’screatorshavesetdefaultvaluestobeautomaticallyappliedifyoudon’tdeclarethem.
NoteDefaultvaluesaregoodforalotofscenarios.WhenIwasinBarcelonaattheApacheLuceneEuroconin2011,thespeakeraskedduringapresentation,“Howmanyofyouhaveeverchangeddefaultvaluesinsolrconfig.xml?”Inalargeroom(200people),onlyfiveorsixguysraisedtheirhands.
Thisismostprobablythesecondfileyouwillhavetoconfigure.Oncetheschemahasbeendefined,youcanfine-tunetheindexchainandsearchbehaviorofyourSolrinstancehere.
www.it-ebooks.info
OtherresourcesSchemaandSolrconfigurationscanmakeuseofotherfilesforseveralpurposes.Thinkaboutstopwords,synonyms,orotherconfigurationfilesspecifictosomecomponent.ThosefilesareusuallyputintheconfdirectoryoftheSolrcore.
www.it-ebooks.info
TroubleshootingIfyouhaveproblemsrelatedtowhatwedescribedpreviously,thefollowingtipsshouldhelpyougetthingsworking.
www.it-ebooks.info
UnsupportedClassVersionErrorYoucaninstallmorethanoneversionofJavaonyourmachinebut,whenrunningacommand(forexample,javaorjavac),thesystemwillpickupthejavainterpreter/compilerthatisdeclaredinyourpath.SoifyougettheUnsupportedClassVersionErrorerror,itmeansthatyou’reusingawrongJVM(mostprobablyJava6orolder).InthePrerequisitessectionearlierinthischapter,there’satablethatwillhelpyou.However,thisistheshortversion:Solr4.7.xallowsJava6or7,butSolr4.8orgreaterrunsonlywith(atleast)Java7.
Ifyou’restartingSolrfromthecommandline,justtypethis:
#java-version
TheoutputofthiscommandwillshowtheversionofJavayoursystemisactuallyusing.Somakesureyou’rerunningtherightJVM,andalsocheckyourJAVA_HOMEenvironmentvariable;itmustpointtotherightJVM.
Ifyou’rerunningSolrinEclipse,aftercheckingwhatisdescribedpreviously(thatis,theJVMthatstartsEclipse),makesureyou’reusingacorrectJVMbynavigatingtoWindow|Preferences|Java|InstalledJREs.
www.it-ebooks.info
The“Failedtoreadartifactdescriptor”messageWhenrunningacommandforthefirsttime(forexample,clean,install,ortest),ApacheMavenwillhavetodownloadalltherequiredlibraries.Inordertodothat,yoursystemmusthaveavalidInternetconnection.
Soifyougetthiskindofmessage,itmeansthatMavenwasn’tabletodownloadarequireddependency.Thenameofthedependencyshouldbeinthemessage.Thereasonforfailurecouldbeanetworkissue,eitherpermanentortransient.
Inthefirstcase,youshouldsimplycheckyourconnection.Inthesecondscenario(thatis,atransientnetworkfailureduringthedownload),therearesomemanualstepsthatneedtobedone.Assumethatthedependencyisorg.apache.solr:solr-solrj:jar:4.8.0.YoushouldgotoyourlocalMavenrepositoryandremovethecontentofthefolderthathoststhatdependency,likethis:
#rm-rf$HOME/.m2/repository/org/apache/solr/solr-solrj/4.8.0
Onthenextbuild,Mavenwilldownloadthatdependencyagain.
www.it-ebooks.info
SummaryInthischapter,webeganourSolrtourwithaquickoverview,includingthestepsthatmustbeperformedwheninstallingSolr.Weillustratedtheinstallationprocessfrombothauser’sandadeveloper’sperspective.Regardlessofthepathyoufollowed,youshouldhaveaworkingSolrinstalledonyourmachine.
Inthenextchapter,wewillcontinueourconversationbydiggingfurtherintotheSolrindexingprocess.
www.it-ebooks.info
Chapter2.IndexingYourDataAlthoughthefinalmotivebehindgettingaSolrinstanceistoenablefastandefficientsearches,weneedtopopulatethatinstancewithsomedatainthefirst(andmandatory)step.Thisoperationisusuallyreferredtoastheindexingphase.ThetermindexplaysanimportantroleintheSolrdomainbecauseitsunderlyingstructureisanindexitself.Thischapterfocusesontheindexingprocess.
Bytheendofthischapter,youwillbereasonablyconversantwithhowtheindexingprocessworksinSolr,howtoindexdata,andhowtoconfigureandcustomizetheprocess.
Thischapterwillcoverthefollowingtopics:
TheSolrdatamodel:invertedindex,document,fields,types,analyzers,andtokenizersIndexandindexingconfigurationTheSolrwritepathHowtoextendandcustomizetheindexingprocessTroubleshooting
www.it-ebooks.info
UnderstandingtheSolrdatamodelWheneverIstarttolearnsomethingthatisnotsimple,Istronglybelievethekeytocontrollingitscomplexityisagoodunderstandingofitsdomainmodel.ThissectiondescribestheunderlyingbuildingblocksofSolr.Itstartswiththesimplestpieceofinformation,thedocument,andthenwalksthoughtheotherfundamentalconcepts,describinghowtheyformtheSolrdatamodel.
www.it-ebooks.info
ThedocumentAdocumentrepresentsthebasicandatomicunitofinformationinSolr.Itisacontaineroffieldsandvaluesthatbelongtoagivenentityofyourdomainmodel(forexample,abook,car,orperson).
Ifyou’refamiliarwithrelationaldatabases,youcanthinkofadocumentasarecord.Thetwoconceptshavesomesimilarities:
Adocumentcouldhaveaprimarykey,whichisthelogicalidentityofdataitrepresents.Adocumenthasastructureconsistingofoneormoreattributes.Eachattributehasaname,type,andvalue.
However,aSolrdocumentdiffersinthefollowingwaysfromadatabaserecord:
Attributescanhavemorethanonevalue,whereasarowinadatabasetablecanhaveonlyonevalue(includingNULL).Attributeseitherhaveavalueordon’texistatall.There’snonotionofNULLvalueinSolr.Attributenamescanbestaticordynamic,buttablecolumnsinadatabasemustbeexplicitlydeclaredinadvance.Attributetypesare,ingeneral,morearticulatedandflexiblebecausetheymustdefinehowSolrinterpretsdatabothatindexandquerytime.Attributetypescanbedefinedandconfigured.Thiscanbedonebyusing,mixing,andconfiguringarichsetofbuilt-inclassesorcreatingnewtypes(thisisactuallyanadvancedscenario).
AsimplewaytorepresentaSolrdocumentisamap—ageneraldatastructurethatmapsuniquekeys(attributenames)tovalues,whereeachkey(thatis,attribute)canhaveoneormorevalues.ThefollowingJSONdatarepresentstwodocuments:
{
{
"id":27302038,
"title":"Abookaboutsomething",
"author":["Ashler,Frank","York,Lye"],
"subject":["Generalities","SocialSciences"],
"language":"English"
},
{
"id":2830002,
"title":"Anotherbookaboutsomething",
"author":"Ypsy,Lea",
"subject:"Geography&History",
"publisher":"Vignanello:Edikin,2010"
}
}
Althoughtheearlierdocumentsrepresentbooksandhavesomecommonattributesasyoucansee,thefirsthastwosubjectsandalanguage,whiletheseconddoesn’thavea
www.it-ebooks.info
publicationlanguage.Ithasonlyonesubjectandanadditionalpublisherattribute.
Fromadocument’sperspective,there’snoconstraintaboutwhichandhowmanyattributesadocumentcanhave.ThoseconstraintsareinsteaddeclaredwithintheSolrschema,whichwewillseelater.
TipThesrc/solr/example-datafolderoftheprojectassociatedwiththischaptercontainssomeexampledatawherethesamedocumentsarerepresentedinseveralformats.
www.it-ebooks.info
TheinvertedindexSolrusesanunderlying,persistentstructurecalledinvertedindex.Itisdesignedandoptimizedtoallowfastsearchesatretrievaltime.Togainthespeedbenefitsofsuchastructure,ithastobebuiltinadvance.
Aninvertedindexconsistsofanorderedlistofallthetermsthatappearinasetofdocuments.Besideeachterm,theindexincludesalistofthedocumentswherethattermappears.
Forexample,let’sconsiderthreedocuments:
{
{"id":1,"title":"TheBirthdayConcert"},
{"id":2,"title":"LiveinItaly"},
{"id":3,"title":"LiveinPaderborn"},
}
Thecorrespondinginvertedindexwouldbesomethinglikethis:
Terms DocumentIds
1 2 3
Birthday X
Concert X
Italy X
Live X X
Paderborn X
The X
In X X
Liketheindexofabook(here,Imeantheindexthatyouusuallyfindattheendofabook),ifyouwanttosearchdocumentsthatcontainagiventerm,aninvertedindexhelpyouwiththatefficientlyandquickly.
InSolr,indexfilesarehostedinaso-calledSolrdatadirectory.Thisdirectorycanbeconfiguredinsolrconfig.xml,themainconfigurationfile.
TipAfterrunninganyexampleintheprojectassociatedwiththisbook,youwillfindtheSolrindexunderthesubfolderslocatedintarget/solr.Thenameofthesubfolderactuallydependsonthenameofthecoreusedintheexample.
www.it-ebooks.info
TheSolrcoreTheindexconfigurationofagivenSolrinstanceresidesinaSolrcore,whichisacontainerforaspecificinvertedindex.Onthedisk,Solrcoresaredirectories,eachofthemwithsomeconfigurationfilesthatdefinefeaturesandcharacteristicsofthecore.
Inacoredirectory,youwilltypicallyfindthefollowingcontent:
Acore.propertiesfilethatdescribesthecore.Aconfdirectorythatcontainsconfigurationfiles:aschema.xmlfile,asolrconfig.xmlfile,andasetofadditionalfiles,dependingoncomponentsinuseforaspecificinstance(forexample,stopwords.txtandsynonyms.txt).Alibdirectory.EveryJARfileplacedinthisdirectoryisautomaticallyloadedandcanbeusedbythatspecificcore.
InaSolrinstallationyoucanhaveoneormorecores,eachofthemwithadifferentconfiguration,thatwillthereforeresultindifferentinvertedindexes.
NoteTheconceptoftheSolrcorehasbeenexpandedinSolr4,specificallyinSolrCloud.WewilldiscussthisinChapter6,DeploymentScenarios.
www.it-ebooks.info
TheSolrschemaReturningtothecomparisonwithdatabases,anotherimportantdifferenceisthat,inrelationaldatabases,dataisorganizedintables.Youcancreateoneormoretablesdependingonhowyouwanttoorganizethepersistenceoftheentitiesbelongingtoyourdomainmodel.
InSolr,thingsbehavedifferently.There’snonotionoftables;inaSolrschema,youmustdeclareattributes,aprimarykey,andasetofconstraintsandfeaturesoftheentityrepresentedbytheincomingdocuments.Althoughthisdoesn’tstrictlymeanyoumusthaveonlyoneentityinyourschema,let’sthinkinthiswayatthemoment(forsimplicity):aSolrschemaislikethedefinitionofasingletablethatdescribesthestructureandtheconstraintsoftheincomingdata(thatis,documents).
TheSolrschemaisdefinedinafilecalled(notsurprisingly)schema.xml.Itcontainsseveralconcepts,butthemostimportantarecertainlythoserelatedtotypesandfields.BeforeSolr4.8,typesandfieldsweredeclaredwithina<types>anda<fields>tag,respectively.Nowtheirdeclarationscanbemixed,whichallowsbettergroupingoffieldswiththeircorrespondingtypes.
TipYoucanfindasampleschemawithinthedownloadbundlewesetupinthepreviouschapter,specificallyunder$INSTALL_DIR/solr-x.y.z/example/solr/collection1/conf/schema.xml.Itishugeandcontainsalotofexamplesaboutpredefinedandbuilt-intypesandfields,withmanyusefulcomments.
FieldtypesFieldtypesareoneofthetop-levelentitiesdeclaredinSolrschemas.Afieldtypeisdeclaredusingthe<fieldType>element.Asyoucanseeintheexampleschema,youcanhaveasimpletype,suchasthis:
<fieldTypename="string"class="solr.StrField"sortMissingLast="true"/>
Youcanalsohavetypeswithalotofinformation,asshownhere:
<fieldTypename="text-general"class="solr.TextField"
positionIncrementGap="100">
<analyzertype="index">
<tokenizerclass="solr.StandardTokenizerFactory"/>
<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>
<filterclass="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzertype="query">
<tokenizerclass="solr.StandardTokenizerFactory"/>
<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>
<filterclass="solr.LowerCaseFilterFactory"/>
<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"/>
</analyzer>
</fieldType>
www.it-ebooks.info
Alltypesshareasetofcommonattributesthataredescribedinthefollowingtable:
Attribute Description
name Thenameofthefieldtype.Thisisrequired.
typeThefullyqualifiednameoftheclassthatimplementsthefieldtypebehavior.Thisisrequired.
sortMissingFirst
sortMissingLast
Optionalattributesthatarevalidonlyforsortablefields.Theydefinethesortpositionofthedocumentsthathavenovaluesforagivenfield.
indexedIfthisistrue,fieldsassociatedwiththistypewillbesearchable,sortablesandfacetable.
storedIfthisistrue,fieldsassociatedwiththistypeareretrievable.Briefly,storedfieldsarewhatSolrreturnsinsearchresponses.
multiValued Ifthisistrue,fieldsassociatedwiththistypecanhavemultiplevalues.
omitNorms
NormsarevaluesconsistingofonebyteperfieldwhereSolrrecordsindextimeboostandlengthnormalizationdata.Indextimeboostallowsonefieldtobeboostedhigherthanother.Lengthnormalizationallowsshorterfieldstobeboostedmorethanlongerfields.Ifyoudon’tuseindextimeboostanddon’twanttouselengthnormalization,thenthisattributecanbesettotrue.
omitTermsAndFrequencyPositions
Tokensproducedbytextanalysisduringtheindexprocessarenotsimplytext.Theyalsohavemetadatasuchasoffsets,termfrequency,andoptionalpayloads.Ifthisattributeissettotrue,thenSolrwon’trecordtermfrequenciesandpositions.
omitPositions Omitsthepositionsinindexedtokens.
positionsIncrementGapWhenafieldhasmultiplevalues,thisattributespecifiesthedistancebetweeneachvalue.Thisisusedtopreventunwantedphrasematches.
autogeneratePhraseQueriesOnlyvalidfortextfields.Ifthisissettotrue,thenSolrwillautomaticallygeneratephrasequeriesforadjacentterms.
compressed Inordertodecreasetheindexsize,storedvaluesoffieldscanbecompressed.
compressThreshold Wheneverthefieldiscompressed,thisistheassociatedcompressionthreshold.
Besidesallofthis,eachspecifictypecandeclareitsownattributes,dependingonthecharacteristicofthetypeitself.
Thetextanalysisprocess
Beforetalkingaboutfields,whicharethetop-levelbuildingblocksoftheSolrschema,let’sintroduceafundamentalconcept—textanalysis.
Thetextanalysisprocessconvertsanincomingvalueintokensbymeansofadedicatedtransformationchainthatisinchargeofmanipulatingtheoriginalinputvalue.Eachresultingtokenisthenpostedtotheindexwiththefollowingmetadata:
Positionincrement:Thepositionofthetokenrelativetotheprevioustokeninthe
www.it-ebooks.info
inputstreamStartandendoffset:ThestartingandendingindexesofthetokenwithintheinputstreamPayload:Anoptionalbytearrayusedforseveralpurposes,suchasboosting
Atokenwithitsmetadataisusuallyreferredtoasaterm.
InSolr,textanalysishappensattwodifferentmoments:indexandsearchtime.Inthefirstcase,thevalueisthecontentofagivenfieldofagivendocumentthataclientsentforindexing.Inthesecondcase,theincomingvaluetypicallycontainssearchtermswithinaquery.
Inbothcases,youmusttellSolrhowtohandlethosevalues.Youcandothatintheschema,inthefieldtypessection.
Forfieldtypes,thefollowinggeneralrulesalwaysapply:
Ifthefieldtypeimplementationclassissolr.TextFieldoritextendssolr.TextField,thenSolrallowsyoutoconfigureoneortwoanalyzersectionsinordertocustomizetheindexand/orthequerytextanalysisprocessInothercases,noanalyzerscanbedefined,andtheconfigurationofthetypeisdoneusingtheavailableattributesofthetypeitself
Thisisanexampleofafieldtypedefinition:
<fieldTypename="text-general"class="solr.TextField"
positionIncrementGap="100">
<analyzertype="index">
…
</analyzer>
<analyzertype="query">
…
</analyzer>
</fieldType>
Here,youcanseetwodifferentanalyzersections.Inthefirstsection,youwilldeclarewhathappensatindextimeforagivenfieldassociatedwiththatfieldtype.Thesecondsectionhasthesamepurpose,butitisvalidforquerytime.
NoteIfyouhavethesameanalysisatindexandquerytimes,youcandefinejustone<analyzer>sectionwithnonameattribute.Thatwillbesupposedtobevalidforbothphases.
Withineachanalyzerdefinition,youdefinethetextanalysisprocessbymeansofcharacterfilters,tokenizers,andtokenfilters.
Charfilters
Charfiltersareoptionalcomponentsthatcanbesetatthebeginningoftheanalysischaininordertopreprocessfieldvalues.Theycanmanipulateacharacterstreambyadding,removing,orreplacingcharacterswhilepreservingtheoriginalcharacterposition.
www.it-ebooks.info
Inthefollowingexample,twocharfiltersareusedtoreplacediacritics(thatis,letterswithglyphssuchasà,ü)andremovesometext:
<analyzertype="index">
<charFilterclass="solr.MappingCharFilterFactory"mapping="mapping-
FoldToASCII.txt"/>
<charFilterclass="solr.PatternReplaceCharFilterFactory"pattern="\\
(Author\\)"replacement=""/>
</analizer>
NoteYoumustneverdeclaretheimplementationclass.Instead,declareitsfactory.
Usingtheprecedingchain,theMillöcker,Carltext(nameofauthor)willbecomeMillocker,Carl.
Acompletelistofavailablecharfilterscanbefoundathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories.
Tokenizers
Atokenizerbreaksanincomingcharacterstreamintooneormoretokensdependingonspecificcriteria.Theresultingsetoftokensisusuallyreferredtoasatokenstream.Ananalyzerchainallowsonlyonetokenizer.
Supposewehave“I’mwritingasimpletext”astheinputtext.Thefollowingtableshowshowtwosampletokenizerswork:
Tokenizer Description Tokens
WhitespaceTokenizer Splitsbywhitespaces “I’m”,“writing”,“a”,“simple”,“text”
KeywordTokenizer Doesn’tsplitatall “I’mwritingasimpletext”
Acompletelistofavailabletokenizerscanbefoundathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenizerFactories.
Tokenfilters
Tokenfiltersworkonaninputtokenstream,contributingsomekindoftransformationtoit.Analyzingtokenaftertoken,afiltercanapplyitslogicinordertoadd,remove,orreplacetokens,andcanthusproduceanewoutputtokenstream.
Tokenfilterscanbechainedtogetherinordertoproducecomplexanalysischains.Theorderinwhichthosefiltersaredeclaredisimportantbecausethechainitselfisnotcommutative.Twochainswiththesamefiltersinadifferentordercouldproduceadifferentoutputstream.
Thisisanextractofasamplefilterchain:
<filterclass="solr.LowerCaseFilterFactory"/>
<filterclass="solr.StopFilterFactory"words="stopwords.txt"
ignoreCase="true"/>
www.it-ebooks.info
Afilterdeclarationincludesthenameoftheimplementationfactoryclassandasetofattributesthatarespecifictoeachfilter.Intheprecedingchain,thisiswhathappensforeachtokenintheinputstream:
Thetokenismadeintolowercase,so“Happy”willbecome“happy”Ifthetokenisastopword,thatis,oneofthewordsdeclaredinafilecalledstopwords.txt,itgetsfilteredfromtheoutgoingstream
Acompletelistofavailabletokenfiltersisavailableathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories.
Puttingitalltogether
Thefollowingcodeillustratesacompletefieldtypedefinition:
<fieldTypename="my-text-type"class="solr.TextField"
positionIncrementGap="100">
<analyzertype="index">
<charFilterclass="solr.MappingCharFilterFactory"mapping="mapping-
FoldToASCII.txt"/>
<tokenizerclass="solr.WhitespaceTokenizerFactory"/>
<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>
<filterclass="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Inordertogetaconcreteviewofwhathappensduringtheindexphaseofagivenfield,openashellinthetop-leveldirectoryoftheprojectassociatedwiththischapter.Next,typethefollowingcommand:
#mvncargo:run–PfieldAnalysis
TipYoucandothesamewithEclipsebycreatinganewMavenDebuglaunchconfiguration.Onthelaunchdialog,youmustfilltheGoalsinputfieldwithcargo:runandtheProfileinputfieldwithfieldAnalysis.
ThatwillstartaSolrinstancewithanexampleschemathatcontainsseveraltypes.OnceSolrhasbeenstarted,openyourbrowserandtypehttp://127.0.0.1:8983/solr/#/analysis/analysis.Thepagethatappearsletsyousimulatetheindexphaseofagivenvalue(thecontentofthelefttextarea)foragivenfieldorfieldtype(thecontentofthedrop-downmenuatthebottomofthepage).
TypesometextintheFieldValue(Index)textarea,chooseafieldtypeorafield,andpresstheAnalyseValuesbutton.Thepagewillshowtheinputandtheoutputvaluesofeachmemberoftheindexchain.Thefollowingscreenshotillustratestheresultingpageafteranalyzingthe“ApacheSolr”textwitharight_truncated_phrasefieldtype:
www.it-ebooks.info
Someexamplefieldtypes
Thissectionlistsanddescribessomeimportantfieldtypesandtheirmainfeaturesinanon-exhaustiveway.Theschema.xmlfileinthedownloadbundlecontainsalotofexampleswithalltheavailabletypes.
Inaddition,alistofallfieldtypesisavailableathttps://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr.String
Thestringtyperetainstheincomingvalueasasingletoken.
NoteThatdoesn’tmeanthefieldcannotbeindexed.Itonlymeansthatthefieldcannothaveauser-definedanalysischain.
Thistypeisusuallyassociatedwiththefollowing:
Indexedfields:Fieldsthatrepresentcodes,classifications,andidentifiers,suchasA340,853.92,SKU#22383,3919928832,292381,anden-USSortfields:Fieldsthatcanbeusedassortcriteria,suchasauthors,titles,andpublicationdates
Numbers
ThereareseveralnumerictypesdefinedinSolr.Theycanbeclassifiedintothreegroups:
BasictypessuchasIntField,FloatField,andLongField.Thesearethelegacytypesthatencodenumericvaluesasstrings.SortablefieldstypessuchasSortableDoubleField,SortableIntField,andSortableLongField.Thesearethelegacytypesthatencodenumericvaluesasstringsinordertomatchtheirnaturalnumericorder(thisisdifferentfromthestring’slexicographicorder).TriefieldstypessuchasTrieIntField,TrieFloatField,andTrieLongField.These
www.it-ebooks.info
arethetypesthatindexnumericvaluesusingvariousandtunablelevelsofprecisioninordertoenableefficientrangequeriesandsorting.ThoselevelsareconfiguredusingaprecisionStepattributeinthefieldtypedefinition.
Thefirsttwogroups,basicandsortabletypes,aredeprecatedandwillsoonberemoved(mostprobablyinSolr5.0).ThisisbecausetheirfeaturesandcharacteristicsarealreadyincludedinTrietypes,whicharemoreefficientandprovideaunifiedwayofdealingwithnumbers.Boolean
Booleanfieldscanhaveavalueoftrueorfalse.Valuesof1,t,orTareinterpretedastrue.Date
TheformatthatSolrusesfordatesisarestrictedversionoftheISO8601DateandTimeformatandisoftheYYYY-MM-DDThh:mm:ss.SSSZform.Herearesomeexamplesofthisfieldtype:
2005-09-27T14:43:11Z
2011-08-23T02:43:00.992Z
TheZcharacterisaliteral,trailingconstantthatindicatestheUTCmethodofthedaterepresentation.Onlythemillisecondsareoptional.Iftheyaremissing,thedot(.)afterthesecondsmustberemoved.
Aswithnumbers,therearetwoavailabletypestorepresentdatesinSolr:
AbasicDateFieldtype,whichisadeprecatedlegacytypeTrieDateField,whichistherecommendeddatetype
Ausefulfeatureofdatetypesisasimpleexpressionlanguagethatcanbeusedtoformdynamicdateexpressions,likethis:
NOW+2YEARS
NOW+3YEARS–3DAYS
2005-09-27T14:43:00+1YEAR
Theexpressionlanguageallowsthefollowingkeywords:
Keyword Description
YEAR/YEARSOneormoreyears.Thesearebasicallysynonyms;thedifferenceisjusttomaketheexpressionsmorereadable(forexample,2YEARSisbetterthan2YEAR).
MONTH/MONTHS Oneormoremonths(forexample,NOW+4MONTHS,NOW–1MONTH).
DAY/DAYS/DATE Adayoracertainnumberofdays(forexample,NOW+1DAY).
HOUR/HOURS Anhouroracertainnumberofhours.
MINUTE/MINUTES Oneormoreminutes.
MILLI/MILLIS
www.it-ebooks.info
MILLISECOND
MILLISECONDS
Oneormoremilliseconds.
Text
Textisthebasictypeforfieldsthatcanhaveaconfigurabletextanalysis.Thisistheonlytypethatacceptsanalyzerchainsinconfigurations.Othertypes
Thefollowinglistbrieflydescribessomeotherinterestingtypes:
Currency:Thistypeprovidessupportformonetaryvalueswithadedicatedtype.Italsoincludesthecapabilitytopluginseveralprovidersfordeterminingexchangeratesbetweencurrencies.Binary:Thistypeisusedtohandlebinarydata.DataissentandretrievedinBase64-encodedstrings.Geospatialtypes:Twotypesareavailableforsupporttogeospatialsearches.ThefirstisLatLonType,fromSolr3.xonwards.Thesecondtype,SpatialRecursivePrefixTreeFieldType,isanewtypeintroducedinSolr4,anditsupportspolygonshapes.Random:Thisisusedtogeneraterandomsequences.Itisusefulifyouwantpseudorandomsortorderingofindexeddocuments.
FieldsFieldsarecontainersofvaluesassociatedwithaspecifictype.Theyrepresentthestructureandthecompositionoftheentityofyourdomainmodel.
Insimplewords,fieldsaretheattributesofthedocumentsyou’regoingtomanagewithSolr.So,forexample,ifSolrservesalibraryOnlinePublicApplicationCatalogue(OPAC),theentitiesintheschemawillmostprobablyrepresentbooks,andtheycouldhavefieldssuchastitle,author,ISBN,cover,andsoon.
Fieldsaredeclaredintheschema.Eachfielddeclarationincludesaname,type,andsetofattributes.Thisisanexampleoffielddeclaration:
<fieldname="title"type="string"indexed="false"stored="true"
required="true"multiValued="false"/>
Thefollowingtableliststheattributesthatcanbespecifiedforeachfield:
Keyword Description
name
Thenameofthefieldmustbeuniqueintheschemaandmustconsistonlyofalphanumericandunderscorecharacters.Itmustnotstartwithanunderscore,anditmustnothavebothaleadingandatrailingunderscorebecausethosekindsofnamesarereserved.
type Thisisthetypeassociatedwiththefield.
indexedIfthisistrue,fieldsassociatedwiththistypewillbesearchable,sortable,andfacetable.Itoverridesthesamesettingontheassociatedtype.
www.it-ebooks.info
storedIfthisistrue,itmakesthefieldsassociatedwiththistyperetrievable.Itoverridesthesamesettingontheassociatedtype.
required Thismarksthefieldasmandatoryininputdocuments.
defaultAdefaultvaluethatwillbeusedatindextime,ifthefieldintheinputdocumentdoesn’thaveavalidvalue.
sortMissingFirst
sortMissingLast
Theseareoptionalattributesdefiningthesortpositionofthedocumentsthathavenovaluesforthatfield.Theyoverridethesamesettingsontheassociatedtype.
omitNorms Omitsthenormsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.
omitPositionsOmitsthetermpositionsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.
omitTermFreqAndPositionsOmitsthetermfrequencyandpositionsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.
termVectorsStoresthetermvectors.Atermvectorisalistofthedocument’stermsandtheirnumberofoccurrencesinthatdocument.
docValuesOnlyavailablefortheString,Trie,andUUIDfields.Thisattributeenhancestheindexbyaddingcolumn-orientedfieldstoadocument-to-valuemapping.
Staticfields
Thefirstcategoryoffieldscontainsthosestaticallydeclaredintheschema.Inthiscontext,staticsimplymeansthatthenameofthefieldisexplicitlyknowninadvance.Thisisanexampleofastaticfield:
<fieldname="isbn"(otherattributesfollow)/>
Dynamicfields
Therearecertainsituationswhereyoudon’tknowinadvancethenameofsomefieldsintheincomingdocuments.Althoughthismaysoundstrange,itisratherafrequentscenario.
Thinkaboutadocumentthatrepresentsabookandistheresultofsomekindofcataloguing.Ingeneral,abibliographicrecordhasalotoffields.Someofthemrepresenttextthatcanbeexpressedbycataloguersinseverallanguages.Forexample,youcanhaveabookwiththeseabstracts:
{
"id":92902893,
"abstract_en":"ThisistheEnglishsummary",
"abstract_es":"Ésteeselresumenenespañol",
(otherfieldsfollow)
}
Youcanhaveanotherbookwiththefollowingdefinition:
{
"id":92902893,
"abstract_it":"L'automazionedellabibliotecadigitale"
www.it-ebooks.info
(otherfieldsfollow)
}
Sothequestionhereis,howcanwedefinetheabstractfield(orfields)inourschema?Thefirstapproachcouldbetodeclareseveralstaticfields—oneforeachlanguage—butthiswillbevalidonlyifweknowalltheinputlanguagesinadvance.Moreover,thisisnotveryextensiblebecauseaddinganewlanguage(forexample,abstract_ru)willrequireachangeintheschema.Dynamicfieldsarethealternative.
Afieldisdynamicwhenitsnameincludesaleadingoratrailingwildcard,thereforeallowingadynamicmatchwithincominginputfields.Adynamicfieldisdeclaredusingthe<dynamicField>element,asfollows:
<dynamicFieldname="abstract_*"(otherattributesfollow)/>
Thefieldwillcatchallfieldsthathaveaprefixequaltoabstract.Hence,itavoidstheneedtostaticallydefinefieldsonebyone,butmostimportantly,itwillcatchanyabstractfieldregardlessofitslanguagesuffix.
Copyfields
IntheSolrschema,youcanuseaspecialcopyFielddirectivetocopyonefieldtoanother.Thisisusefulwhenadocumenthasagivenfield,andstartingfromitsvalue,youwanttohaveotherfieldsinyourschemapopulatedwiththesamevaluebutwithadifferenttextanalysis.
Let’ssupposeyourdocumentsrepresentbooksthatcancontaintwodifferentkindsofauthors:
persons(forexample,DanteAlighieriandLeonardoDaVinci)corporates(forexample,AssociationforChildhoodEducationInternational)
Youmustshowthoseauthorsseparatelyintheuserinterface,aspartofcustomerrequirements.Youcangivethemdedicatedlabels,forexample.Atthesametime,thecustomerwantstohaveanauthorsearchfeatureontheuserinterfacethattriggersasearchforallkindsofauthors.ThefollowingscreenshotshowsaGUIwidgetthatisoftenusedinthesescenarios—asearchtoolbarwithadrop-downmenuthatallowstheusertoconstrainthescopeofthesearchwithinagivencontext(forexample,authors,subjects,andtitles):
Afirstapproachcouldbetohavetwostoredandindexedfields.Whentheusersearchesforanauthorbytypinganameorasurname,suchtermswillbesearchedwithinthosetwofields.Theschemainthiscaseshouldbeasfollows:
www.it-ebooks.info
<fieldname="author_person"type="text"indexed="true"stored="true"…/>
<fieldname="author_corporate"type="text"indexed="true"stored="true"…/>
Asecondchoicecouldbetohaveamorecohesivedesignbyseparatingsearchandviewresponsibilities.Inthiscase,wewillhavetwostored(butnotindexed)fieldsrepresentingthetwokindsofauthors,andagenericindexed(butnotstored)author_searchfieldcontainingalltheauthorsofadocument,regardlessofitstype.Inthisway,theuserinterfacewillusethestoredfieldsforvisualization,whileSolrwillusethecatch-allauthor_searchfieldforsearches.ThisdesignintroducesthecopyFielddirective;hereisthecorrespondingschema:
<fieldname="author_person"type="string"indexed="false"stored="true"
required="false"multiValued="true"/>
<fieldname="author_corporate"type="string"indexed="false"stored="true"
required="false"multiValued="true"/>
<fieldname="author_search"type="text"indexed="true"stored="false"
required="false"multiValued="true"/>
<copyFieldsource="author_person"dest="author_search"/>
<copyFieldsource="author_corporate"dest="author_search"/>
ThecopyFielddirectivecopiestheincomingvalueofthesourcefieldinthedestfield;thus,attheend,theauthor_searchfieldwillcontainallkindsofauthors.
NoteInboththesourceanddestattributes,it’spossibletouseatrailingoraleadingwildcard,thereforeavoidingrepetitivecode.Intheprecedingexample,wecouldhavejustonecopyFielddeclaration:
<copyFieldsource="author_*"dest="author_search"/>
OtherschemasectionsOtherthanfieldsandfieldtypes,theSolrschemacontainssomeotherthingsaswell.Thissectionbrieflyillustratesthem.
Uniquekey
Thisfielduniquelyidentifiesyourdocument.Thisisnotstrictlyrequiredbutstronglyrecommendedifyouwanttoupdateyourdocuments,avoidduplicates,and(lastbutnotleast)useSolrdistributedfeatures.
Defaultsimilarity
ThiselementallowsyoutodeclarethefactoryoftheclassusedbySolrtodeterminethescoreofdocumentswhilesearching.
www.it-ebooks.info
SolrindexingconfigurationOncetheschemahasbeendefined,it’stimetoconfigureandtunetheindexingprocessbymeansofanotherfilethatresidesinthesamedirectoryoftheschema—solrconfig.xml.
Thefilecontainsalotofsections,butfortunately,therearealotofoptionalpartswithdefaultvaluesthatusuallyworkwellinmostscenarios.Wewilltrytounderlinethemostimportantofthemwithrespecttothischapter.
Asageneralnote,it’spossibletousesystempropertiesanddefaultvalueswithinthisfile.Therefore,weareabletocreateadynamicexpression,likethis:
<dataDir>${my.data.dir:/var/data/defaultDataDir}</dataDir>
ThevalueofthedataDirelementwillbereplacedatruntimewiththevalueofthemy.data.dirsystemproperty,orwiththedefaultvalueof/var/data/defaultDataDirifthatpropertydoesn’texist.
www.it-ebooks.info
GeneralsettingsTheheadingpartofthesolrconfig.xmlfilecontainsgeneralsettingsthatarenotstrictlyrelatedtotheindexphase.
ThefirstistheLucenematchversion:
<luceneMatchVersion>LUCENE_47</luceneMatchVersion>
ThisallowsyoutocontrolwhichversionofLucenewillbeinternallyusedbySolr.ThisisusefultomanagemigrationphasestowardsthenewerversionsofSolr,thusallowingbackwardcompatibilitywithindexesbuiltwithpreviousversions.
Asecondpieceofinformationyoucansethereisthedatadirectory,thatis,thedirectorywhereSolrwillcreateandmanagetheindex.Itdefaultstoadirectorycalleddataunder$SOLR_HOME.
<dataDir>/var/data/defaultDataDir</dataDir>
www.it-ebooks.info
IndexconfigurationThesectionwithinthe<indexConfig>tagcontainsalotofthingsthatyoucanconfigureinordertofine-tunetheSolrindexphase.
Acuriousthingyoucanseeinthissection,inthesolrconfig.xmlfileoftheexamplecore,isthatmostthingsarecommented.Thisisveryimportant,becauseitmeansthatSolrprovidesgooddefaultvaluesforthosesettings.
Thefollowingtablesummarizesthesettingsyouwillfindwithinthe<indexConfig>section:
Attribute Description
writeLockTimeout ThemaximumallowedtimetowaitforawritelockonanIndexWriter.
maxIndexingThreadsThemaximumallowednumberofthreadsthatindexdocumentsinparallel.Oncethisthresholdhasbeenreached,incomingrequestswillwaituntilthere’sanavailableslot.
useCompoundFileIfthisissettotrue,Solrwilluseasinglecompoundfiletorepresenttheindex.Thedefaultvalueisfalse.
ramBufferSizeMBWhenaccumulateddocumentupdatesexceedthismemorythreshold,allpendingupdatesareflushed.
ramBufferSizeDocsThishasthesamebehaviorasthatofthepreviousattribute,butthethresholdisdefinedasthecountofdocumentupdates.
mergePolicy Thenamesoftheclass,alongwithsettings,thatdefinesandimplementsthemergestrategy.
mergeFactor
Athresholdindicatinghowmanysegmentsanindexisallowedtohavebeforetheyaremergedintoonesegment.Eachtimeanupdateismade,itisaddedtothemostrecentindexsegment.Whenthatsegmentfillsup(thatis,whenthemaxBufferedDocsandramBufferSizeMBthresholdsarereached),anewsegmentiscreatedandsubsequentupdatesareinsertedthere.Oncethenumberofsegmentsreachesthisthreshold,Solrwillmergeallofthemintoonesegment.
mergeScheduler Theclassthatisresponsibleforcontrollinghowmergesareexecuted.
lockType ThelocktypeusedbySolrtoindicatethatagivenindexisalreadyownedbyIndexWriter.
www.it-ebooks.info
UpdatehandlerandautocommitfeatureThe<UpdateHandlerSection>configuresthecomponentthatisresponsibleforhandlingrequeststoupdatetheindex.
Thisiswhereit’spossibletotellSolrtoperiodicallyrununsolicitedcommitssothatclientswon’tneedtodothatexplicitlywhileindexing.Declaringtwodifferentthresholdscantriggerauto-commits:
maxDocs:ThemaximumnumberofdocumentstoaddsincethelastcommitmaxTime:Themaximumamountoftime(inmilliseconds)topassforadocumentbeingaddedtoindex
Theyarenotexclusive,soit’sperfectlylegaltohavesettingssuchasthese:
<autoCommit>
<maxDocs>5000</maxDocs>
<maxTime>300000</maxTime>
</autoCommit>
StartingfromSolr4.0,therearetwokindsofcommit.Ahardcommitflushestheuncommitteddocumentstotheindex,thereforecreatingandchangingsegmentsanddatafilesonthedisk.Theothertypeiscalledsoftcommit,whichdoesn’tactuallywriteuncommittedchangesbutjustreopenstheinternalSolrsearcherinordertomakeuncommitteddatainthememoryavailableforsearches.
Hardcommitsareexpensive,butaftertheirexecution,dataispermanentlypartoftheindex.Softcommitsarefastbuttransient,soincaseofasystemcrash,changesarelost.
HardandsoftcommitscancoexistinaSolrconfiguration.Thefollowingisanexamplethatshowsthis:
<autoCommit>
<maxTime>900000</maxTime>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
Here,asoftcommitwillbetriggeredeverysecond(1000milliseconds),andahardcommitwillrunevery15minutes(900000milliseconds).
www.it-ebooks.info
RequestHandlerARequestHandlerinstanceisapluggablecomponentthathandlesincomingrequests.Itisconfiguredinsolrconfig.xmlasaspecificendpointbymeansofitsnameattribute.
RequestssenttoSolrcanbelongtoseveralcategories:search,update,administration,andstats.Inthiscontext,weareinterestedinthosehandlersthatareinchargeofhandlingindexupdaterequests.Althoughnotmandatory,thosehandlersareusuallyassociatedwithanamestartingwiththe/updateprefix,forexample,thedefaulthandleryouwillfindintheconfiguration:
<requestHandlername="/update"class="solr.UpdateRequestHandler"/>
PriortoSolr4,eachkindofinputformat(forexample,JSON,XML,andsoon)requiredadedicatedhandlertobeconfigured.Nowthegeneral-purposeupdatehandler,thatis,the/updatehandlerusesthecontenttypeoftheincomingrequestinordertodetecttheformatoftheinputdata.Thefollowingtableliststhebuilt-incontenttypes:
Mime-type Description
application/xml
text/xmlXMLmessages
application/json
text/jsonJSONmessages
application/csv
text/csvComma-separatedvalues
application/javabin Java-serializedobjects(Javaclientsonly)
Eachformathasitsownwayofencodingthekindofupdateoperation(forexample,add,delete,andcommit)andtheinputdocuments.ThisisasampleaddcommandinXML:
<add>
<doc>
<fieldname="id">12020</field>
<fieldname="title">Roundaroundmidnight</field>
</doc>
…
</add>
Later,wewillindexsomedatausingdifferenttechniquesanddifferentformats.
www.it-ebooks.info
UpdateRequestProcessorThewritepathoftheindexprocesshasbeenconceivedbySolrdeveloperswithmodularityandextensibilityinmind.Specifically,theindexprocesshasbeenstructuredasachainofresponsibilities,whereeachsetofcomponentsaddsitsowncontributiontothewholeindexprocess.
TheUpdateRequestProcessorchainisanimportantconfigurableaspectoftheindexprocess.Ifyouwanttodeclareyourcustomchain,youneedtoaddacorrespondingsectionwithintheconfiguration.Thisisanexampleofacustomchain:
<updateRequestProcessorChainname="my-index-chain">
<processorclass="…"/>
<processorclass="…">
<strname="aParameterName">aParameterValue</str>
</processor>
<processorname="solr.RunUpdateProcessorFactory"/>
<processorname="solr.LogUpdateProcessorFactory"/>
</updateRequestProcessorChain>
DefininganewchainrequiresanameandasetofUpdateRequestProcessorFactorycomponentsthatareinchargeofcreatingprocessorinstancesforthatchain.
NoteActually,thedefinitionofthechainisnotenough.Itmustbeenabled,(thatis,associatedwithRequestHandler)inthefollowingway:
<requestHandlername="/myReqHandler"
class="solr.UpdateRequestHandler">
<lstname="defaults">
<strname="update.chain">chain.name</str>
</lst>
</requestHandler>
TherearealotofalreadyimplementedUpdateRequestProcessorcomponentsthatyoucanuseinyourchain,butingeneral,it’sabsolutelyeasytocreateyourownprocessorandcustomizetheindexchain.
TipTheexampleprojectwiththischaptercontainsseveralexamplesofUpdateRequestProcessorwithintheorg.gazzax.labs.solr.ase.ch2.urppackage.
www.it-ebooks.info
IndexoperationsThissectionshowsyouthebasiccommandsneededforupdatinganindex,byaddingorremovingdocuments.Asageneralnote,eachcommandwewillseecanbeissuedinatleasttwoways:usingthecommandline,throughthecURLtool,forexample(abuilt-intoolinalotofLinuxdistributionsandavailableforallplatforms);andusingcode(thatis,SolrJorsomeotherclientAPI).Whenyouwanttoadddocuments,it’salsopossibletorunthosecommandsfromtheadministrationconsole.
NoteSolrJandclientAPIswillbecoveredlaterinadedicatedchapter.
AnothercommonaspectoftheseinteractionsistheSolrresponse,whichalwayscontainsastatusandaQTimeattribute.Thestatusisareturnedcodeoftheexecutedcommand,whichisalways0iftheoperationsucceeds.TheQTimeattributeistheelapsedtimeoftheexecution.ThisisanexampleoftheresponseinXMLformat:
<response>
<lstname="responseHeader">
<intname="status">0</int>
<intname="QTime">97</int>
</lst>
</response>
www.it-ebooks.info
AddThecommandsendsoneormoredocumentstoaddtoSolr.Thedocumentsthatareaddedarenotvisibleuntilacommitoranoptimizecommandisissued.
WealreadysawthatdocumentsaretheunitofinformationinSolr.Here,dependingontheformatofthedata,oneormoredocumentsaresentusingtheproperrepresentation.
Sincetheattributesandthecontentofthemessagewillbethesameregardlessoftheformat,theformaldescriptionofthemessagestructurewillbegivenonce.ThefollowingisanaddcommandinXMLformat:
<addcommitWithin="10000"overwrite="true">
<docboost="1.9">
<fieldname="id">12020</field>
<fieldname="title"boost="2.2">Roundaroundmidnight</field>
<fieldname="subject">Music</field>
<fieldname="subject">Jazz</field>
</doc>
…
</add>
Let’sdiscusstheprecedingcommandindetail:
<add>:ThisistheroottagoftheXMLdocumentandindicatestheoperation.commitWithin:Thisisanalternativetotheautocommitfeatureswesawpreviously.Usingthisoptionalattribute,therequestorasksSolrtoensurethatthedocumentswillbecommittedwithinagivenperiodoftime.overwrite:ThistellsSolrtocheckoutandeventuallyoverwritedocumentswiththesameuniqueKey.Ifyoudon’thaveauniqueKey,oryou’reconfidentthatyouwon’teveraddthesamedocumenttwice,youcangetsomeindexperformanceimprovementsbyexplicitlysettingthisflagtofalse.<doc>:Thisrepresentthedocumenttobeadded.boost:Thisisanoptionalattributethatspecifiestheboostforthewholedocument(thatis,foreachfield).Itdefaultsto1.0.<field>:Thisisafieldofthedocumentwithjustonevalue.Ifthefieldismultivalued,therewillbeseveralfieldswiththesamenameanddifferentvalues.boost:Thisisanoptionalattributethatspecifiestheboostforthespecificfield.Itdefaultsto1.0.
ThesamedatacanbeexpressedinJSONasfollows:
{
"add":{
"commitWithin":10000,
"overwrite":true,
"doc":{
"boost":1.9,
"id":12020,
"title":{
"value":"Roundaroundmidnight",
"boost":2.2
www.it-ebooks.info
},
"subject":["Music","Jazz"]
}
}
}
Asyoucansee,theinformationisthesameasinthepreviousexample.ThedifferenceisintheencodingoftheinformationaccordingtotheJSONformat.
SendingaddcommandsWecanissueanaddcommandinseveralways:usingcURL,theadministrationconsole,andaclientAPIsuchasSolrJ.
ThecURLtoolisacommand-linetoolusedtotransferdatawithURLsyntax.Amongotherprotocols,itsupportsHTTPandHTTPS,soit’sperfectforsendingcommandstoSolr.ThesearesomeexamplesofaddcommandssentusingcURL:
#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--
data-binary@datafile.xml
#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--
data-binary
'<addcommitWithin="10000"overwrite="true">
<docboost="1.9">
<fieldname="id">12020</field>
…
<fieldname="subject">Jazz</field>
</doc>
…
</add>'
Thefirstexampleusesdatacontainedinafile.Thesecond(usefulforshortrequests)directlyembedsthedocumentsinthedata-binaryparameter.TheprecedingexamplesareperfectlyvalidforJSONandCSVdocumentsaswell(obviously,thedataformatandthecontenttypewillchange).
www.it-ebooks.info
DeleteAdeletecommandwillmarkoneormoredocumentsasdeleted.Thismeansthetargetdocumentsarenotimmediatelyremovedfromtheindex.Instead,akindoftombstoneisplacedonthem;whenthenextcommiteventhappens,thatdatawillberemoved.Commitsandoptimizesarecommandsthatmaketheupdatechangesvisibleandavailable.Inotherwords,theymakethosechangeseffectivelypartoftheSolrindex.Wewillseebothofthemlater.
Solrallowsustoidentifythetargetdocumentsintwodifferentways:byspecifyingasetofidentifiersorbydeletingalldocumentsmatchedbyaquery.Inthesamewayaswesentaddcommands,wecanusecURLtoissuedeletecommands:
#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--
data-binary@datafile_with_deletes.xml
#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--
data-binary
'<delete>
<id>92392</id>
<query>publisher:"Ashler"</query>
</delete>'
Inthesecondexample,weissuedacommandtodelete:
Thedocumentwith92392asuniqueKeyAlldocumentsthathaveapublisherattributewiththeAshlervalue
www.it-ebooks.info
Commit,optimize,androllbackChangesresultingfromaddanddeleteoperationsarenotimmediatelyvisible.Theymustbecommittedfirst;thatis,acommitcommandhastobesent.
WealreadyexploredhardandsoftunsolicitedcommitsintheIndexconfigurationsection.ThesamecommandcanbeexplicitlysenttoSolrbyclients.
Althoughwepreviouslydescribedthedifferencebetweenhardandsoftcommits,it’simportanttorememberthatahardcommitisanexpensiveoperation,causingchangestobepermanentlyflushedtodisk.Softcommitsoperateexclusivelyinmemory,andarethereforeveryfastbuttransient;so,intheeventofaJVMcrash,softlycommitteddataislost.
TipInaprototypeI’mworkingon,weindexdatacomingfromtrafficsensorsinSolr.Asyoucanimagine,theinputflowiscontinuous;itcanhappenseveraltimesinasecond.Acontrolsystemneedstoexecuteagivensetofqueriesatshortperiodicintervals,forexample,everyfewseconds.Inordertomakethemostupdateddataavailabletothatsystem,weissueasoftcommiteverysecondandahardcommitevery20minutes.Atthemoment,thisseemstobeagoodcompromisebetweentheavailabilityoffreshdataandtheriskofdataloss(itcouldstillhappenduringthose20minutes).
Forthoseinterested,theSolrextensionwewilluseinthatprojectisavailableonGitHub,athttps://github.com/agazzarini/SolRDF.ItallowsSolrtoindexRDFdata,anditisagoodexampleofthecapabilitiesofSolrintherealmofcustomization.
Athirdkindofcommit,whichisactuallyahardcommit,istheso-calledoptimize.Withoptimize,otherthanproducingthesameresultsasthoseofahardcommit,Solrwillmergethecurrentindexsegmentsintoasinglesegment,resultinginasetofintensiveI/Ooperations.Themergeusuallyoccursinthebackgroundandiscontrolledbyparameterssuchasmergescheduler,mergepolicy,andmergefactor.Likethehardcommit,optimizeisaveryexpensiveoperationintermsofI/Obecause,apartfromcostingthesameasahardcommit,itmusthavesometemporaryspaceavailableonthedisktoperformthemerge.
Itispossibletosendthecommitortheoptimizecommandtogetherwiththedatatobeindexed:
#curlhttp://127.0.0.1:8983/solr/update?commit=true-H"Content-type:
text/xml"--data-binary@datafile.xml
#curlhttp://127.0.0.1:8983/solr/update?optimize=true-H"Content-type:
text/xml"--data-binary@datafile.xml
Themessagepayloadcanalsobeacommitcommand:
#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--
data-binary'<commit/>'
AcommithasafewadditionalBooleanparametersthatcanbespecifiedtocustomizethe
www.it-ebooks.info
servicebehavior:
Parameter Description
waitSearcher Thecommandwon’treturnuntilanewsearcherisopenedandregisteredasthemainsearcher
waitFlush Thecommandwon’treturnuntiluncommittedchangesareflushedtodisk
softCommit Ifthisistrue,asoftcommitwillbeexecuted
Beforecommittinganypendingchange,it’spossibletoissuearollbacktoremoveuncommittedaddanddeleteoperations.Thefollowingareexamplesofrollbackrequests:
#curlhttp://127.0.0.1:8983/solr/update?rollback=true
#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--
data-binary'<rollback/>'
www.it-ebooks.info
ExtendingandcustomizingtheindexprocessAswesawbefore,theSolrindexchainishighlycustomizableatdifferentpoints.Thissectionwillgiveyousomehintsandexamplestocreateyourownextensioninordertocustomizetheindexingphase.
www.it-ebooks.info
ChangingthestoredvalueoffieldsOneofthemostfrequentneedsthatIencounterwhileI’mindexingbibliographicdataistocorrectorchangetheheadings(labels)belongingtotheincomingrecords(documents).
NoteThishasnothingtodowiththetextanalysiswehavepreviouslyseen.Here,wearedealingwithunwanted(wrong)values,diacriticsthatneedtobereplaced,oringeneral,labelsintheoriginalrecordthatwewanttochangeandshowtotheendusers.InSolrterms,wewanttochangethestoredvalueofafieldbeforeitgetsindexed.
SupposealibraryhasalotofrecordsandwantstopublishtheminanOPAC.Unfortunately,manyofthoserecordshavetitleswithatrailingunderscore,whichhasaspecialmeaningforlibrarians.Whilethisisnotaproblemforthecataloguingsoftware(becauselibrariansareawareofthatconvention),itisnotacceptabletoendusers,anditwillsurelybeseenasatypo.Soifwehaverecordswithtitlessuchas“Agoodoldstory_”or“Thisisanothertitle_”inourapplication,wewanttoshow“Agoodoldstory”and“Thisisanothertitle”withoutunderscoreswhentheusersearchesforthoserecords.
Rememberthatanalyzersandtokenizersdeclaredinyourschemaonlyactontheindexedvalueofagivenfield.Thestoredvalueiscopiedverbatimasitarrives,sothere’snochancetomodifyitonceitisindexed.
Inthesecases,anUpdateRequestProcessorperfectlyfitsourneeds.TheexampleprojectassociatedwiththischaptercontainsseveralexamplesofcustomUpdateRequestProcessors.Here,weareinterestedinRemoveTrailingUnderscoreProcessor,whichcanbefoundinthesrc/main/javawithintheorg.gazzax.labs.solr.ase.chr.urppackage.
Asyoucansee,writinganUpdateRequestProcessorrequirestwoclassestobeimplemented:
Factory:Aclassthatextendsorg.apache.solr.update.processor.UpdateRequestProcessorFactory
Processor:Aclassthatextendsorg.apache.solr.update.processor.UpdateRequestProcessor
Thefirstisafactorythatcreatesconcreteinstancesofyourprocessorandcanbeconfiguredwithasetofcustomparametersinsolrconfig.xml:
<processorclass="org.gazzax.labs.solr.ase.chr.urp.
RemoveTrailingUnderscoreProcessorFactory">
<arrname="fields">
<strname="fields">title</str>
<strname="fields">author</str>
</arr>
</processor>
Inthiscase,insteadofhardcodingthenameofthefieldsthatwewanttocheck,wedefineanarrayparametercalledfields.Thatparameterisretrievedinthefactory,specificallyin
www.it-ebooks.info
theinit()method,whichwillbecalledbySolrwhenthefactoryisinstantiated:
privateString[]fields;
@Override
publicvoidinit(NamedListargs){
SolrParamsparameters=SolrParams.toSolrParams(args);
this.fields=parameters.getParams("fields");
}
TheotherrelevantsectionofthefactoryisinthegetInstancemethod,whereanewinstanceoftheprocessoriscreated:
@Override
publicvoidgetInstance(SolrQueryRequestreq,SolrQueryReponseres,
UpdateRequestProcessornext){
returnnewRemoveTrailingUpdateRequestProcessor(next,fields);
}
Anewprocessorinstanceiscreatedwiththenextprocessorinthechainandthelistoftargetfieldsweconfigured.Nowtheprocessorreceivesthoseparametersandcanadditscontributiontotheindexphase.Inthiscase,wewanttoputsomelogicbeforetheaddphase:
@Override
publicvoidprocessAdd(finalAddUpdateCommandcommand){
//1.RetrievetheSolr(Input)Document
SolrInputDocumentdocument=command.getSolrInputDocument();
//2.Loopthorughtargetfields
for(Stringname:fields){
//3.Getthefieldvalue
//weassumetargetfieldsaremonovaluedforsimplicity
Stringvalue=document.getFieldValue(name);
//4.Checkandeventuallychangethevalue
if(value!=null&&value.endsWith("_")){
StringnewValue=value.substring(0,value.length()-1);
document.setFieldValue(name,newValue);
}
}
//5.IMPORTANT:forwardtothenextprocessorinthechain
super.processAdd(command);
}
TipYoucanfindthesourcecodeofthewholeexampleundertheorg.gazzax.labs.solr.ase.ch2.urppackageofthesourcefolderintheprojectassociatedwiththischapter.ThepackagecontainsadditionalexamplesofUpdateRequestProcessor.
www.it-ebooks.info
IndexingcustomdataThedefaultUpdateRequestHandlerisverypowerfulbecauseitcoversthemostpopularformatsofdata.However,therearesomecaseswheredataisavailableinalegacyformat.Hence,weneedtodosomethinginordertohaveSolrworkingwiththat.
Inthisexample,Iwilluseaflatfile,thatis,asimpletextfilethattypicallydescribesrecordswithfieldsofdatadefinedbyfixedpositions.TheyareverypopularinintegrationprojectsbetweenbanksandERPsystems(justtogiveyouaconcretecontext).
TipIntheexampleprojectassociatedwiththischapter,youcanfindanexampleofsuchafiledescribingbooksunderthesrc/solr/solr-homes/flatIndexer/example-input-datafolder.
Here,eachlinehasafixedlengthof107charactersandrepresentsabook,withthefollowingformat:
Parameter Position
Id 0to8
ISBN 8to22
Title 22to67
Author 67to106
Therearetwoapproachesinthisscenario:thefirstmovestheresponsibilityontheclientside,thuscreatingacustomindexerclientthatgetsthedatainanyformatandcarriesoutsomemanipulationtoconvertitintooneofthesupportedformats.Wewon’tcoverthisscenariorightnow,aswewilldiscussclientAPIsinanextchapter.
AnotherapproachcouldbeacustomextensionoftheUpdateRequestHandler.Inthiscase,wewanttohaveanewcontenttype(text/plain)andacorrespondingcustomhandlertoloadthatkindofdata.Therearetwothingsweneedtoimplement.ThefirstisasubclassoftheexistingUpdateRequestHandler:
publicclassFlatDataUpdateextendsUpdateRequestHandler{
@Override
protectedMap<String,ContentStreamLoader>createDefaultLoaders(NamedList
n){
Map<String,ContentStreamLoader>registry=newHashMap<String,
ContentStreamLoader>();
registry.put("text/plain",newFlatDataLoader());
returnregistry;
}
}
Here,wearesimplyoverridingthecontenttyperegistry(theregistryinthesuperclasscannotbemodified)toaddourcontenttype,withacorrespondinghandlercalled
www.it-ebooks.info
FlatDataLoader.ThisclassextendsContentStreamLoaderandimplementstheparsinglogicoftheflatdata:
publicclassFlatDataLoaderextendsContentStreamLoader
Thecustomloadermustprovideaload(…)methodtoimplementthestreamparsinglogic:
@Override
publicvoidload(
SolrQueryRequestreq,
SolrQueryResponsersp,
ContentStreamstream,
UpdateRequestProcessorprocessor)throwsException{
//1.getareaderassociatedwiththecontentstreamBufferedReader
reader=null;
try{
reader=newBufferedReader(stream.getReader());
StringactLine=null;
while((actLine=reader.readLine())!=null){
//2.Sanitycheck:checklinelength
if(actLine.length()!=107){
continue;
}
//3.parseandcreatethedocument
SolrInputDocumentdoc=newSolrInputDocument();
doc.setField("id",actLine.substring(0,8));
doc.setField("isbn",actLine.substring(8,22));
doc.setField("title",actLine.substring(22,67));
doc.setField("author",actLine.substring(67));
AddUpdateCommandcommand=getAddCommand(req);
command.solrDoc=document;
processor.processAdd(command);
}finally{
//Closethereader
…
}
}
Ifyouwanttoviewthisexample,justopenthecommandlineinthefolderoftheprojectassociatedwiththischapter,andrunthefollowingcommand:
#mvncargo:run–PflatIndexer
TipYoucandothesamewithEclipsebycreatinganewMavenlaunchaspreviouslydescribed.Inthatcase,youwillalsobeabletoputdebugbreakpointsinthesourcecode(yoursourcecodeandtheSolrsourcecode)andproceedstepbystepintheSolrindexprocess.
OnceSolrhasstarted,openanothershell,changethedirectorytogototheprojectfolder,andrunthefollowingcommand:
www.it-ebooks.info
#curlhttp://127.0.0.1:8983/solr/flatIndexer/update?commit=true-H
"Content-type:text/plain"--data-binary@src/solr/solr-
homes/flatIndexer/example-input-data/books.flat
Youshouldseesomethinglikethisintheconsole:
[UpdateHandler]start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=f
alse,softCommit=false,prepareCommit=false}
…
[SolrCore]SolrDeletionPolicy.onCommit:commits:num=2
[SolrCore]newestcommitgeneration=4
[SolrIndexSearcher]OpeningSearcher@77ee04bb[flatIndexer]main
[UpdateHandler]end_commit_flush
Nowopentheadministrationconsoleathttp://127.0.0.1:8983/solr/#/flatIndexer/query,andclickontheExecuteQuerybutton.Youshouldseethreedocumentsontherightpane.
TipYoucanfindthesourcecodeoftheentireexampleundertheorg.gazzax.labs.solr.ase.ch2.handlerpackageofthesourcefolderintheprojectassociatedwiththischapter.
www.it-ebooks.info
TroubleshootingThissectionprovidessuggestionsandtipsonhowtoresolvesomecommonproblemsencounteredwhendealingwithindexingoperations.
www.it-ebooks.info
MultivaluedfieldsandthecopyFielddirectiveThecardinalityofafieldcanbetricky,especiallywhenusedinconjunctionwithcopyFielddirectives,wheretwoormoresingle-valuedfieldsarecopiedtoanotherfield,likethis:
<fieldname="author_person"…required="true"/>
<fieldname="author_corporate"…required="true"/>
<fieldname="author_search"…multiValued="true"/>
<copyFieldsource="author_person"dest="author_search"/>
<copyFieldsource="author_corporate"dest="author_search"/>
Inthiscase,thedestinationfieldmustbemultivalued.Otherwise,therewillbetwovaluesfortwodifferentsourcefields,andSolrwillrefusetoindexthewholedocument,showingERRORmultiplevaluesencounteredfornonmultiValuedfieldauthor_search.
www.it-ebooks.info
ThecopyFieldinputvalueAcommonmisunderstandingwiththecopyFielddirectiveisrelatedtothevaluethatisbeingcopiedfromthesourcetothedestfield.SupposeyoudefinefieldA,fieldB,andacopyFielddirectivefromAtoB:
<fieldname="A"type="text_without_stopwords"…/>
<fieldname="B"type="light_stemmed_text"…/>
<copyFieldsource="A"dest="B"/>
IrrespectiveofthetextanalysiswedefinedforfieldAandfieldB.FieldBwillgetthestoredvalueoffieldA,withoutanytextanalysisapplied.Inotherwords,theincomingvalueforthefieldAiscopiedverbatimtofieldBbeforeanyanalysistextcanbeassociatedwiththatfield.
So,ifwehaveavalueof“oneandtwo”forfieldA,“and”isconsideredasastopword.The“oneandtwo”valueisinjectedintofieldA,whichwilltriggerthetextanalysisforthetext_without_stopwordstype,thereforeresultinginanindexedvalue(forfieldA)composedoftwotokens:“one”,“two”(“and”hasbeenremoved).
Next,thevalueoriginalvalueoffieldA(“oneandtwo”)iscopiedtofieldB,triggeringthetextanalysisassociatedwiththatfield.
www.it-ebooks.info
RequiredfieldsandthecopyFielddirectiveArequiredattributeonastaticfielddenotesthatanincomingdocumentmustcontainavalidvalueforthatfield.IfafieldisthetargetordestinationofacopyFielddirectivetherequiredattributemeansthatinsomeway,thereshouldbeavalueforthatfieldcomingfromitssources.Seethefollowingexample:
<fieldname="A"…required="false"/>
<fieldname="B"…required="false"/>
<fieldname="C"…required="true"multiValued="true"/>
<copyFieldsrc="A"dest="C"/>
<copyFieldsrc="B"dest="C"/>
FieldsAandBarenotrequiredandtheyarecopiedinfieldC.SincethefieldCismandatory,youhavetomakesurethat,foreachinputdocument,atleastAorBwillhaveavalidvalue,otherwiseSolrwillcomplainaboutamissingvalueforfieldC.
www.it-ebooks.info
Storedtextisimmutable!AstoredfieldvalueisthetextthatcomesfromtheSolr(Input)document.Itwillbecopiedverbatimbecauseitarriveswithoutanychanges.Anytextanalysisconfiguredintheschemaforagivenfieldtypewon’taffectthatvalue.
Inotherwords,thestoredvaluewon’tbechangedatallbySolrduringtheindexphase.
www.it-ebooks.info
DatanotindexedThedesignofUpdateRequestProcessorfollowsthedecoratorpattern,consistingofanestedchainofresponsibilitywhereeachringisexecutedoneaftertheother.YourcustomUpdateRequestProcessorwillgetareferencetothenextprocessorinthechainduringitslifecycle.Onceitsworkhasbeendone,itiscrucialtoforwardtheexecutionflowtothenextprocessor.Otherwise,thechainwillbeinterruptedandnodatawillbeindexed.
www.it-ebooks.info
SummaryInthischapter,wesawthemainconceptsoftheindexingphaseinSolr.Beinganinverted-index-basedsearchengine,Solrstronglyreliesontheindexingphasebyallowingacustomizableandtunableindexchain.
TheSolrwritepathisachainofresponsibilityconsistingofseveralactors,eachofthemwithapreciseroleintheoverallprocess.Whileyoumustknow,configure,andcontrolthosecomponentsasauser,youmustalsobeawareoftheirhighlevelofextensibility(asadeveloper).ThisallowsyoutoadaptandeventuallycustomizeaSolrinstanceaccordingtoyourspecificneeds.
WeaddressedtheconceptsthatformtheSolrdatamodel,suchasdocuments,core,schema,fields,andtypes.Wealsolookedattheindexingconfigurationandtheinvolvedcomponentssuchasupdaterequestprocessors,updatechains,andrequesthandlers.Wefinallydescribedhowtoconfigurethesecomponentsandwriteextensionsontopofthem.
Thepurposeoftheindexingphaseandtheindexitselfistooptimizespeedandperformanceinfindingrelevantdocumentsduringsearches.Hence,thewholeprocessisnotusefulwithoutthesearchphase,whichisthesubjectofthenextchapter.
www.it-ebooks.info
Chapter3.SearchingYourDataOncedatahasbeenproperlyindexed,it’sdefinitelytimetosearch!Theindexingphasemakesnosenseifthingsendthere.Dataisindexedmainlytospeedupandfacilitatesearches.
ThischapterfocusesonsearchcapabilitiesofferedbySolrandillustratestheseveralcomponentsthatcontributetoitsreadpath.
Thechapterwillcoverthefollowingtopics:
QueryingSearchconfigurationTheSolrreadpath:queryparsers,searchcomponents,requesthandlers,andresponsewritersExtendingSolrTroubleshooting
www.it-ebooks.info
ThesampleprojectThroughoutthischapter,wewilluseasampleSolrinstancewithaconfigurationthatincludesallthetopicswewillgraduallydescribe.Thisinstancewillhaveasetofsimpledocumentsrepresentingmusicalbums.Thesearethefirstthreedocuments:
<doc>
<fieldname="id">1</field>
<fieldname="title">AModernJazzSymposiumofMusicandPoetry</field>
<fieldname="composer">CharlesMingus</field>
…
</doc>
<doc>
<fieldname="id">2</field>
<fieldname="title">WhereJazzmeetsPoetry</field>
<fieldname="artist">RaphaelAustin</field>
…
</doc>
<doc>
<fieldname="id">3</field>
<fieldname="title">I'mInTheMoodForLove</field>
<fieldname="composer">CharlieParker</field>
<fieldname="genre">Jazz</field>
…
</doc>
ThesourcecodeofthesampleprojectassociatedwiththischaptercontainstheentireMavenproject,whichcanbeeitherloadedinEclipseorusedviathecommandline.Asapreliminarystep,openashell(orrunthefollowingcommandwithinEclipse)intheprojectfolderandtypethis:
#mvncleancargo:run–Pquerying
TheprecedingcommandwillstartanewSolrinstance,withsampledatapreloaded.
TipThesampledataisautomaticallyloadedatstartupbymeansofacustomSolrEventListener.Youcanfindthesourcecodeundertheorg.gazzax.labs.solr.ase.ch3.listenerpackage.
Youcanusethepagelocatedathttp://127.0.0.1:8983/solr/#/example/querytotryandexperimentbyyourselftheseveralthingswewilldiscuss.
TipIfyouloadedtheprojectinEclipse,under/src/dev/eclipseyouwillfindthelaunchconfigurationusedtostartSolr.
www.it-ebooks.info
QueryingSolrcanbeseenasatell-and-asksystem;thatis,youfirstputin(index)somedata,thenitcananswerquestionsyouask(query)aboutthatdata.Sincetheactorsinvolvedintheseinteractionsarenothumans,Solrprovidesaformalandsystematicwaytoexecutebothindexandqueryoperations.Specifically,fromaqueryperspective,thatrequiresaspecializedlanguagethatcanbeinterpretedbySolrinordertoproducetheexpectedanswers.Suchalanguageisusuallycalledaquerylanguage.
www.it-ebooks.info
Search-relatedconfigurationThesolrconfig.xmlfilehasa<query>sectionthatcontainsseveralsearchsettings.Mostofthemarerelatedtocaches,acriticaltopicthatwillbedescribedinChapter5,AdministeringandTuningSolr.
Aswealreadysaidfortheindexsection,allthoseparametershavegooddefaultsthatworkwellinalotofscenarios.Thislistdescribestherelevantsettings(cachesettingsarenotincluded):
Searcherlifecyclelisteners:Wheneverasearcherisopened,it’spossibletoconfigureoneormorequeriesthatwillbeautomaticallyexecutedinordertoprepopulatecaches.Usecoldsearcher:Ifasearchisissuedandthereisn’taregisteredsearcher,thecurrentwarmingsearcherisimmediatelyused.Ifthisattributeissettofalse,theincomingrequestwillwaituntilthewarmingcompletes.Maxwarmingsearchers:Thisisthemaximumnumberofsearchersthatarewarminginparallel.Theexampleconfigurationcontainsavalueof2,whichisgoodforpuresearcherinstances.Forindexers(whichcouldbealsosearchers),ahighervaluecouldbeneeded.
www.it-ebooks.info
QueryanalyzersInthepreviouschapter,wediscussedanalyzers.Theirmeaninghereisthesame,andthedifferenceresidesonlyintheirinputvalue.Whenweindexdata,thatvalueisthecontentofthefieldsthatmakeuptheinputdocuments.Atquerytime,theanalyzerprocessesavalue,term,orphrasecomingfromaqueryparserandrepresentingacompoundingpieceoftheuser-enteredquery.
TipInthepreviouschapter,weusedtheanalysispagetoseehowtextanalysisworksatindextime.Thatverypagehasanadditionalsectionthatcanbeusedtoseethesameprocessbutusingthequeryanalyzer.
www.it-ebooks.info
CommonqueryparametersAquerytoSolr,otherthanasearchstring,includesseveralparametersthatarepassedusingstandardHTTPprocedures,thatis,name/valuepairsinthequerystring,likethis:http://127.0.0.1:8080/solr/ch3/search?q=history&start=10&rows=10&sort=title
asc
Whilesomeofthemstrictlydependonthecomponentthatwillbeinchargeofhandlingtherequest,therearesetsofcommonparameters.Thefollowingtabledescribesthem:
Parameter Description
q ThesearchstringthatindicateswhatweareaskingtoSolraccordingtoagivensyntax.
start Thestartoffsetwithinsearchresults.Thisisusedtopaginatesearchresults.
rows Themaximumsize(thatis,numberofdocuments)ofthereturnedpage.
sortAcomma-separatedlistof(indexed)fieldsthatwillbeusedtosortsearchresults.Eachfieldmustbefollowedbythekeywordasc(forascendingorder)ordesc(descendingorder).
defTypeIndicatesthequeryparserthatwillinterpretthespecificsearchstring.Eachqueryparserhasdifferentfeaturesanddifferentrulesandacceptsadifferentsyntaxinqueries.
fl Acomma-orspace-separatedlistoffieldsthatwillbereturnedaspartofthematcheddocuments.
fq Afilterquery.Theparametercanberepeated.
wt Theresponseoutputwriterthatwilldeterminetheresponseoutputformat.
debugQueryIfthisistrue,anadditionalsectionwillbeappendedtotheresponsewithanexplanationofthecurrentreadpath.
explainOther
Theuniquekeyofadocumentthatisnotpartofsearchresultsforagivenquery.Solrwilladdasectiontotheresponseexplainingwhythedocumentassociatedwiththatidentifierhasbeenexcludedfromsearchresults.
timeAllowedAconstraintonthemaximumamountoftimeallowedforqueryexecution.Ifthetimeoutexpires,Solrwillreturnonlypartialresults.
cache Enablesordisablesquerycaching.
omitHeader
Bydefault,theresponsecontainsaninformationheaderthatcontainssomemetadataaboutthequeryexecution(forexample,inputparametersorqueryexecutiontime).Ifthisparameterissettotrue,thentheheaderisomittedintheresponse.
Thefollowingaresomeexamplesqueries:http://localhost:8983/solr/example/query?
q=charles&fq=genre:jazz&rows=5&omitHeader=tue&debugQuery=true
http://localhost:8983/solr/example/query?
q=charles&rows=10&omitHeader=tue&debugQuery=true&explainOther=2
http://localhost:8983/solr/example/query?q=*:*&start=5&rows=5
www.it-ebooks.info
Asyoucanimagine,theqparameter,whichcontainsthequery,willbeveryimportantinthischapter.Besidesthis,therearetwootherparameters—fl(fieldlist)andfq(filterqueries)—thatwillbedescribedinthenextsections,becausetheyhavesomeinterestingaspects.
FieldlistsTheflparameterindicateswhichfields(amongfieldsthathavebeenmarkedasstored)willbereturnedindocumentswithinaqueryresponse.Thinkofthesetwoscenarios:
Aschemathatcontainsalotoffields,probablydefiningmultipleentities(thatis,booksandauthors).I’mlookingforbookssoIdon’twanttoseeanyauthorattributes(andviceversa).Aschemathatcontainsstoredfieldswithalotoftext,usedforthehighlightingcomponent,forexample(itrequiresthathighlightsnippetscomefromastoredfield).WhenIexecutequeriesIdon’twantthosefieldstobereturnedaspartofthematchingdocuments.Inotherwords:Iwanttoexcludethosefieldsfromsearchresults.
Theflparameterspecifiesthelistoffieldsthatwillcompoundeachmatcheddocument,thusfilteringoutunwantedattributes.Theparameteracceptsaspace-orcomma-separatedlistofvalues,whereeachvaluecanbeanyofthefollowing:
Afieldname(forexample,title,artist,released,andsoon).Theliteralscore,whichisavirtualfieldindicatingthecomputedscoreforeachdocument.Aglob,whichisanexpressionthatdynamicallymatchesoneormorefieldsbymeansofthe*and?wildcardcharacters(forexample,art*,r?leas?d,andre?leas*).Theasterisk(*)character,whichmatchesallavailable(thatis,stored)fields.Afunctionthat,whenevaluated,willproduceavalueforavirtualfieldthatwillbeaddedtodocuments.Atransformer.Likeafunction,thisisanotherwaytocreatevirtualfieldsindocuments,withadditionaldatasuchastheLucenedocumentID,shardidentifier,orthequeryexecutionexplanation.
Explicitfields,score,functions,andtransformerscanbealiasedbyprefixingthemwithanamethatwillbeusedinplaceoftherealnameofthatmember.
TipSOLR-3191trackstheactivityrelatedtoaso-calledfieldexclusionfeature.Oncethispatchhasbeenapplied,itwillbepossibletoexplicitlyindicatewhichfieldsmustnotbepartofthereturneddocuments.
Thefollowingtablelistssomeexamplesoftheflparameter:
Example Description
*,score Allstoredfieldsandthescorevirtualfield
www.it-ebooks.info
t*,*d Allfieldsstartingwithtandendingwithd
max(old_price,new_price) Maximumvaluebetweenold_priceandnew_price
max_price:max(p1,p2) Afunctionalias
title,t_alias:title,[docid] Title,aliasedtitle,andatransformer
Thedifferencebetweenthethirdandfourthexamplesintheprecedingtableisinthenameofthefieldthatwillholdthefunctionvalue.Inthefirstcase,itwillbethefunctionitself;intheother,itwillbeavirtualfieldcalledmax_price.
TipWiththesampleinstancerunning,youcantrytheseexamplesbyissuingarequestsuchashttp://127.0.0.1:8983/solr/example/query?q=id:1&fl=,replacingthevalueoftheflparameter.
Acompletelistofavailablefunctionscanbeaccessedathttp://wiki.apache.org/solr/FunctionQuery#Available_Functions.
Acompletelistofavailabletransformerscanbereadathttps://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents.
FilterqueriesFilterqueriesoperateakindofintersectionontopofdocuments,resultingfromtheexecutionofthemainquery.Afilterqueryislikehavingarequiredconditioninyourmainquery(thatis,anadditionalclauseconcatenatedwiththeANDoperator),butwithsomeimportantdifferences:
ItisexecutedseparatelyandbeforethemainqueryThefilterandtheintersectionareappliedontopofthemainqueryresultsItdoesn’tinfluencethescoreofthedocuments,whichiscomputedintheexecutionofthemainqueryTheresultsoffilterqueriesarecachedseparatelysothattheycanbereusedforfurtherexecutions
Therecanbemorethanonefqparameterinasearchquery.Inthiscase,theresultoftheoverallexecutionwilltakeintoaccountallfilterclauses,thereforeresultingindocumentsthatsatisfytheintersectionbetweenthemainresultsandtheresultsofeachfilterquery.
FilterquerycachingisoneofthemostcrucialfeaturesofSolr.Afilterquery’sdesignshouldreflecttheaccesspatternofrequestorsasmuchaspossible.Considerthisfilterquery:
fq=genre:JazzANDreleased:1981
Theprecedingquerywillcachetheresultsofthosetwoclausestogether.So,ifyourapplicationprovidestwoseparatefilters(fortheendusers),genreandreleased,thefollowingfilterquerieswon’tbenefitfromthiscache,andtheywillbecached(again)separately:
www.it-ebooks.info
fq=genre:Jazz
fq=released:1981
Inthissituation,thefirstqueryshouldberewritteninthefollowingway,allowingreuseofthecacheassociatedwitheachfilterquery:
fq=genre:Jazz&fq=released:1981
www.it-ebooks.info
QueryparsersAqueryparserisacomponentresponsiblefortranslatingasearchstringorexpressionintospecificinstructionsforSolr.Everyqueryparserunderstandsagivensyntaxforexpressingqueries.
Solrcomeswithseveralqueryparsers,givingtherequestorsawiderangeofwaysofaskingwhattheyneed.
www.it-ebooks.info
TheSolrqueryparserTheSolrqueryparser,oftenmistakenlycalledLucenequeryparser,isimplementedinorg.apache.solr.search.LuceneQParserPlugin.Itisratheraschema-drivensupersetofthedefaultLucenequeryparser.
NoteNotethePluginsuffixoftheclassname.Solrprovidesanextensibleframeworkforcreatingandplugginginyourownqueryparser.
Thefollowingsectionswilldescribetherelevantaspectsofthisparser.
Terms,fields,andoperatorsYou’vealreadymetterms.Theyareatomicunitsofinformationresultingfromananalysisappliedtogiventext.Atindextime,thattextisthevalueofafieldbelongingtoagiven(input)document.Atquerytime,termscomefromtheuser-enteredquerystring.Specifically,aquerystringisbrokenintoterms,fields,andoperators.
Termscanbesimpleorcompoundterms;forexample,theycanbesinglewordssuchasCM,Standard,and1959orphrasessuchas“GoodbyePorkPieHat.”Phrasesaretwoormorewordssurroundedbydoublequotes.
Fieldsarewhatwedeclaredintheschema.xmlfile.Theirusewithinasearchstringallowsarequestortoexpressinstructionssuchas“searchxinfieldy”wherexisatermoraphraseandyisthefieldname.Herearesomeexamplesoftheuseoffields:
title:"WhereJazzmeetsPoetry"
composer:Mingus
Operatorsarekeywordsorsymbolsusedasconjunctionsbetweenseveralfield-valuecriteriainordertocreatecomplexexpressions,suchasthis:
title:JazzORcomposer:CharlieANDreleased:1959
genre:JazzANDNOTreleased:1959
Thefollowingtabledescribestheavailableoperators:
Operator Description
AND Aconjunctionbetweentwocriteria,bothofwhichmustbesatisfied
OR Aconjunctionbetweentwocriteriawhereatleastonemustbesatisfied
+ Marksatermasrequired
-/NOT Marksatermasprohibited
It’salsopossibletouseapairofparenthesestogroupseveralfieldsorvaluescriteria,likethis:
(released:1957ANDcomposer:Mingus)OR(released:1976ANDNOTgenre:Jazz)
www.it-ebooks.info
ORreleased:(1988OR1959)
BoostsBoostingallowsyoutocontroltherelevanceofagivenmatchingdocument,thusofferingawaytogivetosomequeryresultsmoreimportancethanothers;forexample,ifyouaremainlyinterestedinJazzandlessinFusionalbums,youcouldusethis:
+genre:Fusion+genre:Jazz^2
Theboostfactorisinsertedafterafieldvaluecriterionandprefixedwithacaretsymbol.Ithastobegreaterthan0,andsinceitisafactor,avaluebetween0and1representsanegativeboost.Ifitisabsent,adefaultboostfactorof1willbeapplied.
WildcardsThewildcardcharacters,*and?,canbeusedwithinterms,withzeroormoreoccurrences.Theycannotbeappliedtocompoundterms(thatis,searchphrases)ornumericanddatetypes.The?wildcardmatchesasinglecharacter,whilethe*matcheszeroormoresequentialcharacters.Herearesomeexamplesofwildcards:
(title:moder*ANDartist:Min*)ORartist:(Yngw?eANDM?lm*)
FuzzyThetildesymbol(~)attheendofatermenablesaso-calledfuzzyquery,allowingyoutomatchtermsthataresimilartothatterm.FuzzylogicisbasedontheDamerau-Levenshteindistancealgorithm.Afterthetilde,youcanputavaluebetween0and2,indicatingtherequiredsimilarity(2meanshighsimilarityisrequired).Thedefaultvaluethatisusediftheparameterisnotgivenis0.5.
WiththeexampleSolrinstancerunning,openthequerypageintheadminconsoleandtypethefollowingquery:
artist:Charles~0.7
Thequeryresponsewillcontaintworesults.ThefirstisanalbumofCharlesMingus,thatisaperfectmatchwiththesearchtermentered.ThesecondartistisCharlieParker,whosenameissimilarbutnotequaltoCharles.
ProximityThesamesymbolthatisusedforafuzzyqueryhasadifferentmeaningwhenusedinconjunctionwithphrasequeries.Nowrunthefollowingquery:
title:"JazzPoetry"
Youwon’tgetanyresultbecausethere’snorecordwiththosetwoconsecutivetermsinthetitle.Usingatildefollowedbyanumber,whichexpressesadistancebetweenterms,youcanenableaproximitysearch,allowingmatchesofdocumentsthathavethosetwotermswithinaspecificdistancefromoneanother.
ThisquerywillmatchthedocumentthathasWhereJazzmeetsPoetryasitstitle:
www.it-ebooks.info
title:"JazzPoetry"~2
ThefollowingquerywillalsomatchthedocumentthathasAModernJazzSymposiumofMusicandPoetryasthetitle:
title:"JazzPoetry"~4
RangesRangesearchesallowustospecifyforagivenfieldasetofmatchingvaluesthatfallbetweenalowerandahigherbound,inclusiveorexclusiveofthosebounds.Herearesomeexamplesofranges:
released:[1957TO1988]
released:[1957TO*]
released:[*TO1988]
released:{1957TO1988}
released:[1957TO1988}
genre:[JazzTONewAge]
Youcanseethatthelowerandhigherboundscanbeliteralvalues,asshowninthefirstexample,wherewearesearchingforalbumsreleasedbetween1957and1988.Theboundscanalsobewildcards,asshowninthesecondandthirdexamples.Squareandcurlybracketsareusedtodenoteanincludedoranexcludedbound,respectively.So,inthefirstexample,both1957and1988areincluded;inthefourthexampletheyareexcluded.
Keepinmindthat,fornon-numericfields(asshowninthefifthexampleintheprecedingcodesnippet)sortingisdonelexicographically.Therefore,asequencesuchas1,02,14,100willresultin02,1,100,14usingthelexicographicorder,whichisverydifferentfromanumericsort.
www.it-ebooks.info
TheDisjunctionMaximumqueryparserTheSolrqueryparserispowerfulwhenitcomestobuildingcomplexexpressions.However,thosearequitefarfromwhattheuserusuallytypesinasearchfield.
ThinkabouttheGooglesearchpage.Whatdoyoutypeinthesearchtextfield?Notanexpression,butjustone,two,ormoretermsassociatedwithwhatyou’relookingfor.
TheDisjunctionMax(DisMax)queryparserdirectlyprocessesthoseuser-enteredtermsandsearchesforeachofthemacrossasetofconfigurabletargetfields,withaconfigurableweightforeachfield.
NoteTheDisMaxparserisenabledbysettingthedefTypeparametertodismax.
TheexampleSolrinstancehasarequesthandlerlisteningto/glike1thatusestheDisMaxparser.
Otherthansearchterms,thisqueryparsersupportssomefeaturesoftheSolrqueryparser,suchasquotes,thatcanbeusedtoindicatephrases,andthe+and-operandstomarkmandatoryandprohibitedterms,respectively.AllothertermmodifierswesawfortheSolrqueryparserareescaped,sotheywillbeinterpretedassearchterms.
Thenameoftheparsercomesfromitsbehavior:
Dis:Thisstandsfordisjunction,whichmeansthat,foreachwordinthequerystring,theparserbuildsanewsubqueryacrossfieldsandboostsspecifiedintheqfparameter.Theresultingqueriesaresubjectedtothefirst(required)constraintdefinedwiththemmparameter,andasetofoptionalclausesdefinedwithotherparameters,whichwewillseelater.Max:Thismeansmaximum,anditpertainstothescoringcomputation.TheDisMaxparserscoresagivendocumentbygettingthemaximumscorevalueamongallmatchingsubqueries.
Thefollowingsectionsdescribetheseveralparametersthattheparseraccepts.
QueryFieldsTheqfparameterindicatesasetoftargetfieldswiththeircorresponding(optional)boosts.Fieldsareseparatedbyspaces,andeachofthemcanhaveanoptionalboostassociatedwithit,henceresultinginexpressionssuchasthis:
qf=title^3.5artists^2.0genre^1.5released
Here,wewanttosearchacrossfourfields,eachofthemwithadifferentimportance,whichwillaffectthescoreassignedtoeachmatchingdocument.Theqfparameterisoneofthemainplaceswherewedefineoursearchstrategy,dependingoncustomerrequirements.
Tip
www.it-ebooks.info
InOPACs,there’sanever-endingdebateaboutwhichisthemorerelevantattributeamongtitlesandsubjects.Atitle,asyoucanimagine,isimportant,butcouldn’tcontaintermsthatarerepresentativesofawork.Asubjectisakindofcontrolledclassificationassignedbyaprofessionaluser(thatis,alibrarian).Asasearchserviceprovider,youcanusetheqfparametertoconfigureboosts,dependingoncustomerneeds,andavoidenteringthatdebate!
TheDisMaxqueryparserhasanotherinterestingfeaturewhensearchingfieldsdeclaredintheqfparameter:whenthosefieldsarenumericordates,inappropriatetermsaredropped.Returningtotheqfexpression,considersearchingforthis:
Mingus1962
Forthetitle,artistandgenrefields,Solrwillbuildtwoqueries.Butforthereleasedfield,itwillcreatejustonequeryusingthe1962word,thusresultinginatotalof7queries:
title:Mingus^3.5,artist:Mingus^2.0,genre:Mingus^1.5,title:1962^3.5,
artist:1962^2.5,genre:1962^1.5,released:1962
Asyoucansee,thereleased:Mingusqueryhasbeendroppedbecausereleasedisanumericfield.
AlternativequeryTheq.altoptionalparameterdefinesaquerythatwillbeusedintheabsenceofthemainquery.
Theq.altqueryisparsedbydefaultusingtheSolrqueryparser,soitacceptsthesyntaxwedescribedinthepreviousparagraph.UsingLocalParams,youcanchangetheq.altparser.
MinimumshouldmatchEverywordorphrasethatisapartofthesearchstring,unlessitisconstrainedbythe+or-operators(andtherefore,markedasrequiredorprohibited),isconsideredasoptional.Forthoseoptionalparts,themmparameterdefinestheminimumnumberofmatchesthatsatisfythequeryexecution.Theinterestingpointhereisthatotherthanacceptingaquantityoranumber,thisparameteralsoallowscomplexexpressions.Thefollowingtableillustratessomeexamplesofmm:
Value Description
Aninteger(forexample,3) Atleastthegivennumberofoptionalclausesmustmatch.
Apercentage(forexample,66%) Atleastthegivenpercentageofoptionalclausesmustmatch.
Anegativenumberoranegativepercentage
Thenumberofoptionalclausesthatmustmatchistheresultofsubtractingthegivenvaluefromthetotalnumberofoptionalclauses(absoluteor100percentdependingontheparametervalue).
www.it-ebooks.info
OneormoreexpressionswiththeX<|>Yformat
IftherearelessthanXoptionalclauses,theymustmatch.IfclausesaregreaterthanX,thenYmustbeusedasthemmvalue.Ycanbeapositiveornegativeintegerorapercentagevalue.Itisalsopossibletoconcatenateseveralexpressions,likethis:
3<75%6<-1
Thismeansthat,withthreeoptionalclauses,allofthemarerequired.Between4and6optionalclauses,werequireamatchof75percent.Finally,formorethansixclauses,werequireamatchofallclausesbutone.
Theseveralsubqueriesresultingfromsearchtermsparsingareconstrainedwiththemmparameter(specifically,anadditionalBooleanqueryactingasaconstraintisconcatenatedwiththeANDoperator),somatchingdocumentsthatdon’tsatisfythemmconstraintwon’tbepartofthesearchresults.
PhrasefieldsOncethelistofmatchingdocumentshasbeenpopulatedaccordingtothesearchcriteriaandconstraints(forexample,mmorfilterqueries),thepfparameterraisesthescoreofdocumentsthathavesearchtermsinproximity.
Astheqfparameter,pfcandeclarealistoffieldswithanoptionalboostfactor.
QueryphraseslopTheqsparameterindicatesaproximityfactortobeusedinthosephrasequeriesthatareeventuallyincludedinthesearchstring.
PhraseslopThepsparameterindicatesaproximityfactortobeusedinphrasequeriesbuiltforpffields.Notethatsuchquerieswillbeexecutedonlytoboostresults(seetheprevioussection),sothisparameterdoesn’taffectmatchingbutonlyboosting.
BoostqueriesThebqparameterdefinesaqueryparsedbytheSolrqueryparserthatwilladditionallyboostsearchresults.Itcanberepeated,thusallowingoneormorequeries.
If,forexample,youwanttogivemoreimportancetoitemswithapricethatfallswithinagivenrange,youcanuseaboostquerylikethis:
price:[10.00TO19]
AdditiveboostfunctionsThebfparameterdefinesafunctionthatwilladditionallyboostsearchresultsbyaddingitsvaluetothecomputedscore.Aswiththebqparameter,itcanberepeatedinordertohavemultiplefunctions.
TiebreakerThetieparameterisafloatnumber.Ithasavaluebetween0and1,anditaffectsthestrategyusedbytheparsertodeterminethefinalscoreofagiven(matching)document.
www.it-ebooks.info
TheDisjunctionMaxparser,assaidbefore,executesasetofsubqueriesontopofthefieldsdeclaredintheqfparameter.Thesubquerythathasthemaximumscoredeterminesthescoreofthedocument.Soschematically:
documentScore=scoreofmatchingsubquerywithhighestscore
However,youcouldendupwithtwodocumentsgettingthesamescore,becausethemaximumvaluecomputedbyeachwinnersubqueryisthesame.
Thetieparameterletsyoutakefine-grainedcontrolofthefinalscoreassignedtoeachdocument,byincludingthescoreofallmatchingsubqueriesinthecomputation.Thoseadditionalscoresaremultipliedbyafactor,thetievalue.So,theprecedingformulabecomesthefollowing:
documentScore=(scoreofmatchingsubquerywithhighestscore)+((tie)
*(scoresofothermatchingsubqueries))
Withavalueof0.0,wewillhaveapuredisjunctionmaxquery,whereonlythemaximumscoreisincluded.Avalueof1.0willleadtoadisjunctionsumquery,wherethefinalscoreisthesumofthescoresofallmatchingsubqueries.
www.it-ebooks.info
TheExtendedDisjunctionMaximumqueryparserThisparser(eDisMax)isbuiltontopoftheDisMaxparserandhassomeadditionalfeaturessuchasfieldedsearch,Booleanoperators,termmodifiers,andbetterhandlingofmistakesinqueries.
NoteTheeDisMaxparsercanbeenabledbysettingthedefTypeparametertoedismax.
TheexampleSolrinstancehasarequesthandlerlisteningto/glike2thatusestheeDisMaxparser.
Thefollowingsectionsdescribeadditionalparametersthatthisparseraccepts.AllparametersdescribedintheDisMaxparsersectionareincluded.
FieldedsearchTheeDisMaxparsersupportsthefullsyntaxoftheSolrqueryparser,thereforeallowingaso-calledfieldedsearch(thatis,title:Jazz)withBooleanoperatorsandtermmodifiers(forexample,fuzzyandproximity).
Inaddition,thisparsersupportsfieldaliasingandrenaming.Thisallowsyoutogiveaninteractionviewtotherequestor(forexample,anenduser,aqueryclient,andsoon)thatispartiallyorcompletelydecoupledfromSolr’sunderlyingdatamodel.
Aliasingisdoneusingthefollowingsyntax:
f.<alias>.qf=(oneormorerealfieldswithoptionalboosts)
Here,<alias>isthevirtualnamethatwillbeassociatedwiththefield(orfields)declaredontherightoperand.Asyoucansee,analiascanbeappliedtosinglefieldsortoagroupoffields.Whenaliasesaredeclared,requestorscanusethemintheirqueries.
Wecanusealiasestolocalizefieldnames:
f.artista.qf=artist//Italianuserswillseean"artista"field
f.kunstler.qf=artist//forGermanusers
Wecanalsousethemtocreatemetafieldsthatgroupasetofrealfields:
f.people.qf=author,illustrator,editor,translator
f.titles.qf=title,front_cover_title,sub_title,uniform_title
PhrasebigramandtrigramfieldsOtherthansupportingthepfparameterwehavealreadyseenforDisMax,thisparseraddstwooptionalfeatures.Thepsparameterbooststhescoreofdocumentswhereinputtermsappearinproximity.Thepf2andpf3parametersofferthesamefeaturebutbysplittingtheinputtermsinconsecutivebigramsandtrigrams,respectively.Therefore,theAllthethingsyouareinputstringwillbecomethefollowingsetof(consecutive)bigrams:
Allthe,thethings,thingsyou,youare
www.it-ebooks.info
Forthesamelogic,itwillbecomethefollowingsetoftrigrams:
Allthethings,thethingsyou,thingsyouare
PhrasebigramandtrigramslopAspssetsthephraseslopforthepfparameter,ps2andps3dothesameforpf2andpf3.Iftheyareabsent,thevalueofpsisused.
MultiplicativeboostfunctionTheboostparameterdeclaresonefunctionasthebfparameter,aswehaveseenfortheDisMaxparser.Thedifferencehereisthatthefunctionvalueismultiplied(notadded)bythecomputedscore.
UserfieldsTheufparameterspecifieswhichfields(realorvirtual)therequestorsareallowedtouseintheirqueries.Usedinconjunctionwithaliasing,itallowsyoutocompletelyhiderealfieldsandhavequerieswithonlyvirtual(thatis,aliased)fields.
LowercaseoperatorsInplainSolrqueryparsersyntax,operatorsneedtobeinuppercase(AND,OR).ThelowercaseOperatorsflagparameter,whichdefaultstotrue,allowsustointerpretasoperatorslowercasetokens(and,or).
NoteAtthetimeofwritingthisbook,onlytheandandorBooleanoperatorsareaffectedbythisparameter.TheNOToperatorisnothandled,andtherefore,thelowercasewordnotisparsedasaliteralterm,eveniflowercaseOperatorsissettotrue.TheJiraissueathttps://issues.apache.org/jira/browse/SOLR-3580trackstheactivityonthistopic.
www.it-ebooks.info
OtheravailableparsersTherearealotofotheravailableparsers,aslistedinthefollowingtable:
Parser Code Description
Lucenequeryparser
luceneTheLucenequeryparserhasmoreorlessthesamefeaturesastheSolrqueryparser.However,thisistheLucene-specificimplementation.
Functionqueryparser
func Createsafunctionqueryfromtheinputstring.
Joinqueryparser
join Normalizesrelationshipsbetweendocumentsbyemulatingajoin.
Termqueryparser
term Createsasingle-termqueryfromtheinputstring.
Boostqueryparser
boostCreatesaboostedqueryfromtheinputstring.Anadditionalparameter,b,isrequiredtoindicatetheboostfunction.
Rawqueryparser
raw Createsatermqueryfromtheinputstringwithoutanytextanalysis.
Spatialfilterqueryparser
geofilt Enablesspatialqueries.
Fieldqueryparser
field Createafieldqueryfromtheinputstring.
Surroundqueryparser
surround Createsasurroundquery.Thisqueryisusedforproximitysearches.
Besidesallofthis,thequeryparserframeworkhasbeenconceivedwithextensibilityinmind,sodevelopersarefreetoimplement,register,andusetheirownqueryparsers.
www.it-ebooks.info
SearchcomponentsAsearchcomponentisareusablemodulethatcontributestosearchresults.Whiledefiningasearchhandler,thatis,acontrollerforagivenkindofsearch,youcancustomizeitsbehaviorbydefiningandconfiguringsearchcomponentsthatwillcontributetoitsoutputresults.
Searchcomponentsmustbedeclaredandusedwithinsolrconfig.xml,themainSolrconfigurationfile.Acomponentdeclarationrequiresaname,theimplementationclass,andasetofoptionalinitializationparameters:
<searchComponentname="prices"class="a.b.c.MyComponent">
<strname="ds-jndi">jdbc/datasource</str>
<strname="service-uri">http://example.org#me</str>
</searchComponent>
Oncedeclared,thesecanbeusedwithinrequesthandlers,whicharetheruntimecontrollersoftheexecutionsofrequests(wewillcoverrequesthandlerslaterinthechapter).
Therearesomepredefinedsearchcomponentsthatmustn’tbeexplicitlydeclaredinsolrconfig.xml.
NoteThatdoesn’tmeantheyareautomaticallyenabled.Theymustbeexplicitlyactivatedordisabled,dependingontheirdefaultstate.
Thedefaultcomponentsarethosecomponentsthatareresponsibleforabsolvingthefundamentalorcommonstepsofaqueryexecutionflow.Thisisthereasonthere’snoneedtodeclarethemexplicitly,unlessyouwanttouseadifferentconfiguration.Inthefollowingsections,wewillillustratethesecomponents.
www.it-ebooks.info
QueryThequerycomponentisresponsibleforparsingandexecutingaquery.Thisisthecomponentthatacceptsqueryandqueryparserparameters,getsareferencetotheappropriatequeryparser,coordinatestheparserinordertoproduceaquery,executesthatquery,andoutputsacorrespondingresponse.
www.it-ebooks.info
FacetThiscomponentenablestheso-calledfacetedsearch.Itcontributestosearchresultsbyaddingasetofconfigurableaggregationscalledfacets.
Whenyouexecutesomesearch,youwillgetbackasinglepageofresultsconsistingofacertainnumberofmatchingdocuments.Enablingfacetingallowsyoutogetanadditionalperspectiveoftheoveralldata,consistingofasetofaggregations.ThefollowingscreenshotshowssomeSolr-poweredfacetsinactiononawebsite,ontherightside:
Thefacetcomponentcanbeactivatedbyspecifyingafacetparameterwithoneofthefollowingvalues:yes,true,oron.
Solrprovidesseveraltypesoffacets:queries,fields,ranges,pivot,andinterval.Eachofthem,wheneverenabled,willaddadedicatedsectiontotheresponse.
FacetqueriesThefacet.queryparameterdeclaresaquery(parsedbytheSolrqueryparser)thatwillbeusedasafacetwiththecorrespondingcounts.Theresults(thatis,counts)ofthisfacetwillbeinaspecificresponsesectioncalledfacets_queries.Theparametercanberepeatedmultipletimes,allowingustospecifyseveralqueries.Usingtheexampledataset,withSolrrunning,openabrowserandtypehttp://127.0.0.1:8983/solr/example/select?q=*:*&facet=true&facet.query=genre:jazz
IntheXMLresponse,youwillseematchingdocumentswithinthe<result>tag,andanadditionalsectiondedicatedtofacets:
<lstname="facet_counts">
<lstname="facet_queries">
<intname="genre:Jazz">3</int>
</lst>
<lstname="facet_fields"/>
<lstname="facet_dates"/>
<lstname="facet_ranges"/>
</lst>
Here,youcanseethatthreedocumentsmatchthefacetquery.Theotherfacetsectionsare
www.it-ebooks.info
emptybecausewedidn’taskforthem.
FacetfieldsFacetfieldsaresurelythemostpopularkindoffacets.Theyaggregatesearchresultsusingasetofgivenandconfigurablefields.
NoteRememberthatafieldmustbedeclaredasindexedintheschemainordertobefaceted.
Otherthanactivatingthefacetfeatureforagivenfield,Solrhasarichsetofparametersthatcanbeusedtotuneandconfigurethefield’sfacetingbehavior.Thesesettingscanbespecifiedforallfieldsorforagivenfield.Forthefirstcase,thefollowingtableillustratestheavailableparameters,theirnames,andmeanings.Forfield-specificsettings,thesameparametersmustbedeclaredwiththefollowingconvention:
f.<field>.<parameter>=<value>
Inthisway,thevalueassociatedwithparameterwillbevalidonlyforthespecificfield.
Parameter Description
facet.field Declaresafieldthatwillbeusedasafacet.Thisparametermustberepeatedforeachfacetfield.
facet.prefix Limitsthetermsusedinfacetingtovaluesthatbeginwithagivenprefix.
facet.sortThesortstrategyofcountswithineachfacet.Onlytwovaluesareallowed:count,whichmeansorderbycount,andindex,whichmeanslexicographicorder.
facet.limitThemaximumnumberofcountsthatcanbereturnedforeachfacet.Avalueof-1willreturnallavailablecounts.
facet.offset Specifiesastartoffsetwithintheavailablecountsoffacets.
facet.mincount Theminimumcountneededforafieldtobeincludedintheresponse.
facet.missingIncludesintheresponsethecountofdocumentsthatmatchthequerybutdon’thaveavalueforagivenfacet.
facet.method ThetypeofalgorithmthatSolrwillusetocomputefacets.
facet.threads Thenumberofparallelworkers(thatis,threads)thatwillcomputethefacets.
Returningtoourpreviousexample,let’sremovethefacetqueryandusesomeadditionalparameterssothatfacetfieldswillbebuilt(forsimplicity,onlythequerystringisreported):
q=*:*&facet=on&facet.field=genre&facet.minCount=1
Inthefacetsections,youwillseethegenrefacetsunderthefacet_fieldssubsection:
<lstname="facet_fields">
<lstname="genre">
www.it-ebooks.info
<intname="ProgressiveRock">10</int>
<intname="Rock">5</int><intname="Fusion">4</int>
<intname="HeavyMetal">4</int>
…
<intname="Popmetal">1</int></lst>
</lst>
Weaskedforthegenrefacetandwesetmincountto1,whichmeansthatfacetswithnocountsareexcludedfromtheresponse.Itisimportanttounderlinethefactthatthedisplayedvalueforafacetfieldisitsindexedvalue,andnotthestoredvalue(thatis,thevaluethatiscopiedverbatimasitarrivesininputdocuments).Inthepreviousexample,thegenrefieldisString,andtherefore,itisnottokenized.Thisisthereasonyouseethecompoundterm(ProgressiveRock)asoneofitsvalues.IfthatfieldhadbeendeclaredasTextFieldandtokenizedwithWhiteSpaceTokenizer,youwouldhaveseentwodifferentvaluesforthatfacet(assumingnofurtherfiltering):ProgressiveandRock.
FacetrangesFacetrangescanbeappliedtonumericordatefields.Asthenamesuggests,withfacetranges,Solrcreatesafacetclassificationbasedonranges.Thefollowingparameterscontrolthiskindoffaceting:
Parameter Description
facet.rangeDeclaresafieldthatwillbeusedasthefacetrange.Theparametermustberepeatedforeachfacetfield.
facet.range.start Declaresthestartofthefacetinterval.
facet.range.end Declarestheendofthefacetinterval.
facet.range.gap Thesizeofeachstepbetweenthestartandtheendoftheinterval.
Thefollowingisasamplequerythatusesfacetrangesforfacetingalbumsbyreleasedate:
q=*:*&facet=on&facet.range=released&facet.range.start=1950&facet.range.end=
2000&facet.range.gap=10
Thatwilladdanothersectionwithinthefacet_countselement:
<lstname="facet_ranges">
<lstname="released">
<lstname="counts">
<intname="1950">1</int>
<intname="1960">1</int>
<intname="1970">6</int>
<intname="1980">8</int>
<intname="1990">5</int>
</lst>
…
</lst>
</lst>
Pivotfacets
www.it-ebooks.info
Wepreviouslydescribedfacetfields;theyprovidetheabilitytoaggregatesearchresultsbyoneormorecategories.Pivotfacetsgoastepaheadinthatdirection.Theyallowustoanalyzedatainmultipledimensions,breakingdownthefacetedvaluesbysubsequent,nestedsubcategories.
Thiskindoffacetingcanbeactivatedthrougharequestlikethis:
q=*:*&facet=on&facet=true&facet.pivot=genre,released
Thefacet.pivotparametercanberepeatedmultipletimes.Foreachrepetition,therewillbeadedicatedandaggregatedresultwithinthefacet_pivotsectionoftheresponse.Here,forsimplicity,weputjustoneparameterwithtwocategories,genreandreleased.Thefollowingexampleisanextractoftheresponseyouwillgetusingthesampleinstanceassociatedwiththischapter:
<lstname="facet_pivot">
<arrname="genre,released">
<lst>
<strname="field">genre</str>
<strname="value">ProgressiveRock</str>
<intname="count">10</int>
<arrname="pivot">
<lst>
<strname="field">released</str>
<intname="value">1992</int>
<intname="count">2</int>
</lst>
<lst>
<strname="field">released</str>
<intname="value">1969</int>
<intname="count">1</int>
</lst>
<lst>
<strname="field">genre</str>
<strname="value">Rock</str>
<intname="count">5</int>
<arrname="pivot">
<lst>
<strname="field">released</str>
<intname="value">1969</int>
<intname="count">1</int>
</lst>
<lst>
<strname="field">released</str>
<intname="value">1986</int>
<intname="count">1</int>
</lst>
…
Asyoucansee,thegenrefacetisbrokendownbyanestedreleasedcategory.Notethattheprecedingnestedstructureisreturnedwithjustonerequest-responseinteraction.Inordertogetthesameresultwithclassicfacetfields,youshouldquerySolrseveraltimeswithincrementalfilters.That’sthereasonthepivotfacetsfeature,actingasafaçadeandhidingallofthatinteractioncomplexity,isveryusefulfornavigatingthehierarchyof
www.it-ebooks.info
thoseaggregations.However,itshouldbeusedcarefully,asitcouldhaveanimpactonperformance.
IntervalfacetsIntervalfacetswereintroducedinSolr4.10.Theycanbeseenasanalternativetofacet(range)queriesbecausetheyallowyoutosetintervalcriteriaforoneormorefields,andcountthenumberofmatchingdocumentsthathavevalueswithinthoseconstraints.
Althoughthesameresultcanbeachievedwithfacetrangequeries,thisimplementationcouldprovideperformanceimprovementinseveralcontexts.AssuggestedintheSolrreferenceguide,itisrecommendedthatyoutryboththemethods.
www.it-ebooks.info
HighlightingThehighlightcomponentcontributestosearchresultsbyaddingasectionthatcontains(foreachdocumentinthecurrentresultpage)asetofsnippetshighlightingthesearchtermsthatareinthedocumentcontent(thatis,inoneormorefieldsofthedocument).Thefollowingscreenshotshowsawebapplicationthatusesthehighlightingfeature:
ThisfeatureisparticularlyusefulwhenyourdatacomesfromrichdocumentssuchasPDFsorMicrosoftOfficedocuments(asshownintheprecedingexample).Usingthehighlightingfeature,it’spossibletogivetheenduseranapproximateideaofthecontextwhere,withinthedocument,enteredtermshavebeenfound.
TipWithintheexampleSolrinstanceassociatedwiththischapter,thereisarequesthandlercalled/highlightthatenablesthisfeatureontitleandartistfields.
Thehighlightingcomponentcanbetuned,orconfigured,withseveralparameters.
www.it-ebooks.info
Fortunately,theprovideddefaultvaluesworkwellinmanyscenarios.Someofthoseparametersaredescribedinthefollowingtable:
Parameter Description
hl Turnshighlightingofforon.Thedefaultvalueisfalse.
hl.qTermstobehighlightedaretakenfromthemainqueryunlessthisparameter,whichitselfrequiresaquery,isspecified.
hl.flAspace-orcomma-separatedlistoffieldsthatwillbeusedforhighlighting.Snippetswillcomeonlyfromthesefields.
hl.snippets Thenumberofhighlightingsnippetsthatwillbereturned.Thedefaultvalueis1.
hl.maxAnalyzedCharThemaximumnumberofcharactersthatwillbeinspected(inagivenfield)tocomputethesnippets.
hl.simple.pre/hl.simple.postIndicatestextthatshouldappearbeforeandafterahighlightedterm.Theydefaultto<em>and</em>HTMLtags,respectively.
Solrcomeswiththreedifferentkindofhighlighters,describedinthefollowingsections.
StandardhighlighterThisisthefirsthighlighterthatwasintroducedinSolr.Solrusesitbydefault.Itisabletoworkontopofalotofquerytypesanddoesn’thaveanyspecialrequirementonfieldstobehighlighted.However,inordertospeedupitswork,termVectorsshouldbeturnedon(forthosefields).
FastvectorhighlighterFastvectorhighlighteristhesecondtypeofhighlighterintroducedinSolr.ItrequiresthattermVectors,termPositions,andtermOffsetsareturnedonforeachfieldthatneedstobehighlighted.Thatallowsfastandscalableexecution,especiallywithdocumentscontaininglargeamountsoftext,butrequiresalotofextraspacefortheindex.However,itsupportsfewquerytypes.
Thefastvectorhighlightercanbeenabledbysettingthehl.useFastVectorHighlighterparametertotrue.
Notethat,iftheprecedingflagsarenotsetfortargetfields,SolrwillcontinuetouseStandardHighlighter.
PostingshighlighterThishighlighterdoesn’tusetermvectors,nordoesitreanalyzethetexttobehighlighted.ItonlyrequiresthestoreOffsetsWithPositionsflagsetforthefieldstobehighlighted.Unliketheothers,thishighlightermustbeexplicitlydeclaredinthesolrconfig.xmlfilewiththefollowingdeclaration:
<searchComponentclass="solr.HighlightComponent"name="highlight">
<highlightingclass="org.apache.solr.highlight.PostingsSolrHighlighter"/>
www.it-ebooks.info
</searchComponent>
Thisisagoodcompromise,comparedwiththefirsttwohighlighters,intermsofperformanceandindexspace.Theinformation(thatis,thepostingoffsets)requiredbythestoreOffsetsWithPositionsflagischeaperthantermvectorsintermsofmemoryanddiskoccupation.However,itissupposedtobeusedtohighlightsimplequeryterms,soitcouldhavesomeunexpectedorunwantedresultswithphrasequeries.
www.it-ebooks.info
MorelikethisThemorelikethissearchcomponentallowsustofinddocumentsthathavesomekindofsimilaritywithagivendocument.ThereareseveralwaystousethisfeatureinSolr:
MoreLikeThisHandler:Thisisafrontcontrollerthatiscompletelydedicatedto“morelikethis”requests.Itacceptsaquerythatidentifiesadocument,andlooksforsimilardocumentsaccordingtoaconfiguredcriterion.MoreLikeThisHandler:ThisissimilartoMoreLikeThisHandler,butinsteadoftakingadocumentastheinput(matchedbyagivenquery),thetextusedtocomputesimilaritycanbedirectlypassedorfetchedfromaURL.MoreLikeThisSearchComponent:Asasearchcomponent,itwillexecutethesimilarsearchforeachdocumentofthecurrentresultpage,thusappendingamorelikethissectiontotheSolrresponse,withalistofsimilardocumentsforeachdocument.Thisisnotreallyrecommendedbecauseitcouldslowdownoverallqueryexecution.
Ingeneral,thefirsttypeisthemostwidelyused.MoreLikeThisdoesn’thavespecialrequirementsforfieldsthataretobeusedforthesimilaritycomputation.However,forbestperformance,TermVectorsshouldbeenabledforthem.
Thefollowingtableillustratestheparametersacceptedbythiscomponent:
Parameter Description
mlt Turnshighlightingofforon.Itdefaultstofalse.
mlt.count Themaximumnumberofsimilardocumentsthatmustbereturned(foreachdocument).
mlt.flThefieldsusedforsimilarity.TheyshouldhaveTermVectorsenabled(recommended)ortheyneedtobestored.
mlt.qfAlistofspace-orcomma-separatedfields(alreadydeclaredinmlt.fl)withcorrespondingboosts.
mlt.minwl/
mlt.maxwl
Theminimumandmaximumwordlengthboundaries.Wordswhoselengthismorethattheseboundariesareignored.
mlt.boostAflagindicatingwhetherthequerywillbeboostedbytherelevanceoftheinterestingterms.Itdefaultstofalse.
mlt.mintf Thisistheminimumtermfrequencyboundary.Itdefaultsto2.
mlt.mindf Thisistheminimumdocumentfrequencyboundary.Itdefaultsto5.
www.it-ebooks.info
OthercomponentsOtherthanthecomponentswesawintheprevioussections,thereareotherbuilt-insearchcomponentsthatarepartoftheSolrframework.Rememberthat,ifyouwanttousethem,theywillhavetobeexplicitlydeclaredandconfiguredwithintheSolrconfiguration.
Thefollowingisashortandnon-exhaustivelistofadditionalcomponents:
Queryelevation:ThisisusedtogivemoreimportancetosomeresultsusingacriterionthathasnothingtodowiththenormalSolrscoringalgorithm.Thecomponentletsyouassociateagivenquerywithacorrespondinglistofmostimportantresults.Terms:ThisprovidesaccesstotheLuceneinternaltermdictionary.Stats:Thisprovidesnumericfieldsstatistics.Spellcheck:Thisprovidesspellcheckingcapabilitiesbymeansofn-gramanalysisofindexeddocumentsorexternaldictionaries.Fromafunctionalpointofview,thiscomponentisusedtobuildtheso-called“Didyoumean?”feature,offeringalternativesearchsuggestionsincaseofusermistakes.TermVector:Thisaddstermvectors(thatis,term,frequency,position,offset,andIDF)ofthematchingdocumentstoarequest.Debug:Thisaddsdebugingandexplanatoryinformationabouttherequestexecution.
www.it-ebooks.info
SearchhandlerWesawrequesthandlersinthepreviouschapter.There,wedefinedarequesthandlerasapluggablecomponentthathandlesincomingrequests.Inthatchapter,wewerereferringtoupdaterequests,thatis,requestscontainingindexupdatecommands.
Here,wewillfocusourattentiononSearchHandler,aspecialfrontcontrollerusedtohandleincomingsearchrequests.TheSearchHandlerclass,althoughitcouldbeseenasthesupertypelayerofallsearchhandlers,isnotabstractanditdefinesastandardsearchbehavior.
www.it-ebooks.info
StandardrequesthandlerStandardRequestHandlerisanemptysubclassofSearchHanlder,soatthetimeofwritingthisbook,usingoneofthemisbasicallythesame.Requesthandlersaredeclaredinthesolrconfig.xmlfile,andtheydefinesearchendpoints.Eachinstanceisassociatedwithagivennameprefixedbyaslash(thenamemustbeunique),animplementationclass,andasetofconfigurationparameters:
<requestHandlername="/mySeacher"class="solr.SearchHandler">
(configuration)
</requestHandler>
WiththesampleSolrinstancerunning,theprecedinghandlerwillanswertooneoftheseURIs:http://localhost:8983/solr/example/query
http://localhost:8983/solr/example/facets
http://localhost:8983/solr/example/jazz
ConfiguringaSearchHandlerinstancemeansdefiningconfigurationparametersand(optionally)searchcomponentsthatwillparticipateinthequeryexecutionchain.
SearchcomponentsMostofthetime,unlessyouhaveaspecificneed,thesearchcomponentsthatdrivethelogicofthesearchexecutioncanbeomittedbecausethefollowinglistwillbeautomaticallyinjected:
Code Component
query QueryComponent
facet FacetComponent
mlt MoreLikeThisComponent
highlight HighlightComponent
stats StatsComponent
debug DebugComponent
Onlythe“query”componentisenabled;theothersneedtobeexplicitlyactivated.
Ifthedefaultchainisnotwhatyouneed,itispossibletodefineacustomchaininthefollowingway:
<arrname="components">
<str>query</str>
<str>facet</str>
…othercomponentsfollow
</arr>
www.it-ebooks.info
Thiswillcompletelyreplacethedefaultchain.Itisalsopossibletoleavethedefaultchainasitisandhaveadditionalprependedorappendedcomponents:
<arrname="first-components">
<str>my_custom_component</str>
…othercomponentsfollow
</arr>
<arrname="last-components">
<str>another_custom_component</str>
…othercomponentsfollow
</arr>
So,ingeneral,theorderofexecutionforsearchcomponentswillbethefollowing:
Componentsdeclaredas“first-components”(optional).Componentsdeclaredas“components”Intheirabsence,thedefaultchainwillbeused.Componentsdeclaredas“last-components”(optional).
ThefollowingisanexampledeclarationofStandardRequestHandler:
<requestHandlername="/jazz"class="solr.StandardRequestHandler">
<!--parametersthatwillbealwaysappliedtotheincomingrequests-->
<lstname="invariants">
<intname="rows">10</int>
</lst>
<!--parametersthatwillbealwaysaddedtotheincomingrequests-->
<lstname="appends">
<intname="fq">genre:jazz</int>
</lst>
<!--defaultsettingsthatcanbeoverriddenbytheincomingrequests-->
<lstname="defaults">
<strname="sort">titleasc</str>
<strname="echoParams">explicit</str>
<strname="q">*:*</str>
<boolname="facet">false</bool>
</lst>
<!—Thisisacustomsearchcomponentthatwillrunafterthedefault
componentchain-->
<arrname="last-components">
<str>prices</str>
</arr>
</requestHandler>
QueryparametersTherequesthandlersandthesearchcomponentsinvolvedinthechainacceptseveralparameterstodrivetheirexecutionlogic.Theseparameters(withcorrespondingvalues)canbedeclaredinthreedifferentsections:
defaults:Parametervalueswillbeusedunlessoverriddenbyincomingrequests
www.it-ebooks.info
appends:Parametervalueswillappendedtoeachrequestinvariants:Parametervalueswillbealwaysbeappliedandcannotbeoverriddenbyincomingrequestsorbythevaluesdeclaredindefaultsandappendsections
Allsectionsareoptional,soyoucanhavenoparametersconfiguredforagivenhandlerandallowtheincomingrequeststodefinethem.Thisisanexampleofahandlerconfiguration:
<lstname="defaults">
<strname="defType">edismax</str>
</lst>
<lstname="appends">
<strname="facet.field">artist</str>
<strname="facet">genre</str>
</lst>
<lstname="invariants">
<strname="wt">json</str>
<boolname="facet">true</bool>
</lst>
www.it-ebooks.info
RealTimeGetHandlerRealTimeGetHandlerisbasicallyaSearchHandlersubclassthataddsRealTimeSearchComponenttothesearchrequestexecution.Inthisway,it’spossibletoretrievethelatestversionofsoftlycommitteddocumentsbyspecifyingtheiridentifiers.
Inordertoenablesuchacomponent,youmustturntheupdatelogfeatureon,insolrconfig.xml:
<updateHandlerclass="solr.DirectUpdateHandler2">
<updateLog>
<strname="dir">${solr.ulog.dir:}</str>
</updateLog>
…
</updateHandler>
Thentherequesthandlercanbedeclaredandconfiguredusingtheprocedurethatwesawintheprevioussection:
<requestHandlername="/get"class="solr.RealTimeGetHandler">
…
</requestHandler>
Thishandleracceptsanadditionalidoridsparameterthatallowsustospecifytheidentifiersofthedocumentswewanttoretrieve.Theidparameteracceptsoneidentifierandcanberepeatedinrequests.Theidsparameteracceptsacomma-separatedlistofidentifiers.
TipOncetheexampleSolrinstanceisup,thishandlerrespondsto/getrequests.
www.it-ebooks.info
ResponseoutputwritersAsalaststep,queryresultsarereturnedtorequestorsinagivenformat.SolrcommunicateswithclientsusingtheHTTPprotocol.Thoseclientsarefreetostarttheinteractionbyaskingforoneformatoranother,dependingontheirneeds.
Althoughadefaultformatcanbeset,theclientcanoverrideitbymeansofthewtparameter.Thevalueofthewtparameterisamnemoniccodeassociatedwithanavailableresponsewriter.
Thereareseveralbuilt-inresponsewritersinSolr,whicharedescribedhere:
ResponseWriter Description
xml TheeXtensibleMarkupLanguageresponsewriter.Thisisthedefaultwriter.
xslt CombinestheXMLresultswithanXSLTfileinordertoproducecustomXMLdocuments.
json JavaScriptObjectNotationresponsewriter.
csv Comma-SeparatedValueresponsewriter.
velocityThisusesApacheVelocitytodirectlybuildwebpageswithqueryresults.Itisveryusefulforfastprototyping.
javabinJavaclientshaveaprivilegedwaytoobtainresultsfromSolrusingthisresponsewriter,whichdirectlyoutputsJavaObjects.
python,ruby,php
Specializedresponsewritersfortheselanguagesthatproduceastructuredirectlytiedtothelanguagerequirements.
www.it-ebooks.info
ExtendingSolrThefollowingsectionswilldescribeandillustrateacoupleofwaysofextending,andcustomizingsearchesinSolr.
www.it-ebooks.info
Mixingreal-timeandindexeddataSometimes,asapartofyoursearchresults,youmaywanttohavedatathatisnotmanagedbySolrbutretrievedfromareal-timesource,suchasadatabase.
Thinkofane-commerceapplication;whenyousearchforsomething,youwillseetwopiecesofinformationbesideeachitem:
Price:Thiscouldbetheresultofsomekindoffrequentlyupdatedmarketingpolicy.Non-real-timeinformationcouldcauseproblemonthevendorside(forexample,awrongpricepolicycouldbeapplied).Availability:Here,wronginformationcouldcauseaninvalidclaimfromcustomers;forexample,“IboughtthatbookbecauseIsawitasavailable,butitisn’t!”
Thisisagoodscenariofordevelopingasearchcomponent.WewillcreateoursearchcomponentandassociateitwithagivenRequestHandler.
Asearchcomponentisbasicallyaclassthatextends(notsurprisingly)org.apache.solr.handler.component.SearchComponent:
publicclassRealTimePriceComponentextendsSearchComponent
Theinitializationofthecomponentisdoneinamethodcalledinit.Here,mostprobablywewillgettheJNDInameofthetargetdatasourcefromtheconfiguration.Thissourceiswherethepricesmustberetrievedfrom:
publicvoidinit(NamedListargs){
StringdsName=SolrParams.toSolrParams(args).get("ds-name");
Contextctx=newInitialContext();
this.datasource=(DataSource)ctx.lookup(dName);
}
Nowwearereadytoprocesstheincomingrequests.Thisisdoneintheprocessmethod,whichreceivesaResponseBuilderinstance,theobjectwewillusetoaddthecomponentcontributiontothesearchoutput.Sincethiscomponentwillrunafterthequerycomponent,itwillfindalistcontainingqueryresultsinResponseBuilder.Foreachitemwithinthoseresults,ourcomponentwillquerythedatabaseinordertofindacorrespondingprice:
publicvoidprocess(ResponseBuilderbuilder)throwsIOException{
SolrIndexSearchersearcher=builder.req.getSearcher();
//holdsthecomponentcontribution
NamedListcontrib=newSimpleOrderedMap();
for(DocIteratorit=builder.getResults().docList.iterator();
iterator.hasNext();){
//ThisistheLuceneinternaldocumentid
intdocId=iterator.nextDoc();
Documentldoc=searcher.doc(docId,fieldset);
//ThisistheSolrdocumentId
Stringid=ldoc.get("id");
www.it-ebooks.info
//Getthepriceoftheitem
BigDecimalprice=getPrice(id);
//Addthepriceoftheitemtothecomponentcontribution
result.add(id,price);
}
//Addthecomponentcontributiontotheresponsebuilder
builder.rsp.add("prices",result);
}
Insolrconfig.xml,wemustdeclarethecomponentintwoplaces.First,wemustdeclareandconfigureitinthefollowingmanner:
<searchComponentname="prices"class="a.b.c.RealTimePriceComponent">
<strname="ds-name">jdbc/prices</str>
</searchComponent>
Thenithastobeenabledinrequesthandlers(asshowninthefollowingsnippet).Sincethiscomponentissupposedtocontributetoasetofqueryresults,itmustbeplacedafterthequerycomponent:
<requestHandlername="/xyz"…>
…
<arrname="last-components">
<str>prices</str>
</arr>
</requestHandler>
Done!Ifyourunaqueryinvokingthe/xyzrequesthandleryouwillseeafterqueryresultanewsectioncalledprices(thenameweusedforthesearchcomponent).Thisreportsthedocumentidandthecorrespondingpriceforeachdocumentinthesearchresults.
TipYoucanfindthesourcecodeoftheentireexampleinthesrcfolderoftheprojectassociatedwiththischapter,undertheorg.gazzax.labs.solr.ase.ch3.sppackage.
IfyouwanttostartSolrwiththatcomponent,justrunthefollowingcommandfromthecommandlineorfromEclipse:
mvncleaninstallcargo:run–Pcustom-search-component
www.it-ebooks.info
UsingacustomresponsewriterInaprojectIwasworkingon,weimplementedtheautocompletefeature,thatis,alistofsuggestionsthatquicklyappearsunderthesearchfieldeachtimeausertypesakey.Thus,thesearchstringisgraduallycomposed.Thefollowingscreenshotshowsthisfeature:
Anewresponsewriterwasimplementedbecausetheuserinterfacewidgethadalreadybeenbuiltbyanothercompany,andtheexchangeformatbetweenthatwidgetandthesearchservicehadbeenalreadydefined.
DoingthatinSolrisveryeasy.Aresponsewriterisaclassthatextendsorg.apache.solr.response.QueryResponseWriter.LikeallSolrcomponents,itcanbeoptionallyinitializedusinganinitcallbackmethod,anditprovidesawritemethodwheretheresponseshouldbeserializedaccordingtoagivenformat:
publicvoidwrite(
Writerwriter,
SolrQueryRequestrequest,
SolrQueryResponseresponse)throwsIOException{
//1.Getareferencetovaluesthatcompoundthecurrentresponse
NamedListelements=response.getValues();
//2.UseaStringBuildertobuildtheoutput
StringBuilderbuilder=newStringBuilder("{")
.append("query:'")
.append(request.getParams().get(CommonParams.Q))
.append("',");
//3.Getareferencetotheobjectwhich
//holdthequeryresult
Objectvalue=elements.getVal(1);
if(valueinstanceofResultContext)
{
ResultContextcontext=(ResultContext)value;
//Theorderedlist(actuallythepagesubset)
//ofmatcheddocuments
www.it-ebooks.info
DocListids=context.docs;
if(ids!=null)
{
SolrIndexSearchersearcher=request.getSearcher();
DocIteratoriterator=ids.iterator();
builder.append("suggestions:[");
//4.Iterateoverdocuments
for(inti=0;i<ids.size();i++)
{
//5.Foreachdocumentweneedtogetthe"label"attr
Documentdocument=searcher.doc(iterator.nextDoc(),FIELDS);
if(i>0){builder.append(",");}
//6.Appendthelabelvaluetowriteroutput
builder
.append("'")
.append(((String)document.get("label")))
.append("'");
}
builder.append("]").append("}");
}
}
//7.andfinallywriteouttheresult.
writer.write(builder.toString());
}
That’sall!Nowtryissuingaquerylikethis:http://127.0.0.1:8983/solr/example/auto?q=ma
Solrwillreturnthefollowingresponse:
{
query:'ma',
suggestions:['MarcusMiller','MichaelManring','Gotamatch','Nigerian
Marketplace','TheCryingmachine']
}
TipYoucanfindthesourcecodeoftheentireexampleundertheorg.gazzax.labs.solr.ase.ch3.rwpackageofthesourcefolderintheprojectassociatedwiththischapter.
IfyouwanttostartSolrwiththatwriter,runthefollowingcommandfromthecommandlineorfromEclipse:
mvncleaninstallcargo:run–Pcustom-response-writer
www.it-ebooks.info
TroubleshootingThissectionwillprovidehelp,tips,andsuggestionsaboutdifficultiesthatyoucouldmeetwhileyou’reexperimentingwithwhatwedescribedinthischapter.
www.it-ebooks.info
Queriesdon’tmatchexpecteddocumentsThere’snosingleanswertothisbigandpopularquestion.Withoutanyadditionalinformation,thefirsttwothingsIwoulddoareasfollows:
Retrythequerybyappendingdebugparameters(forexample,debugQueryandexplainOther)andanalyzetheexplainsection.There’sawonderfulonlinetool(http://explain.solr.pl)thatmakeslifeeasybyexplainingdebuginformation.Usethefieldanalysispage,typesomesamplevalues,andseewhathappensatindexandquerytime.Probably,youranalyzerchainsarenotconsistent.
www.it-ebooks.info
MismatchbetweenindexandqueryanalyzerUsingdifferentanalyzerchainsatindexandquerytimesometimescausesproblemsbecausetokensproducedatquerytimedon’tmatch,asonewouldexpect,withtheoutputtokensatindextime.Thefieldanalysispagehelpsalotindebuggingthesesituations.Typeavalueforafieldandseewhathappensatqueryandindextime.Inaddition,thispageprovidesacheckforallhighlightingmatchesbetweenindexandquerytokens.
www.it-ebooks.info
NoscoreisreturnedinresponseThescorefieldisavirtualfieldthatmustbeexplicitlyaskedforinrequests.Avalueof*intheflparameterisnotenoughbecause*means“allrealfields.”Arequestforallrealfieldsthatalsoincludethescoremustprovideanflparameterwiththevalueof*,score.Notethatthisisvalidingeneralforallvirtualfields(forexample,functions,transformers,andsoon).
www.it-ebooks.info
SummaryInthischapterwemettheSolrsearchcapabilities,ahugesetoffeaturesthatpowerupinformationretrievalonSolr.Wesawalotoftoolsusedtoimprovethesearchexperienceofclients,requestors,andlastbutnotleast,endusers.Afterexaminingtheindexingphase,youcanwellimaginethatsearchandinformationretrievalconstitutetheactualfunctionalgoalsofafull-textsearchplatform.
WemetthedifferentpiecesthatcompoundSolr’ssearchcapabilities:analyzers,tokenizers,queryparsers,searchcomponents,andoutputwriters.Forallofthem,Solrprovidesagoodsetofalternatives,alreadyimplementedandreadytouse.Forthosewhohavespecificrequirements,itisalwayspossibletocreatecustomizationsandextensions.
Inthenextchapter,keepinginmindthebigpictureofcrucialphasesinaninformationretrievalsystem,wewilltakealookatclientAPIs.TheavailablelibrariesaregreatexamplesofhowtouseSolr’sHTTPservicestoworkprogrammaticallywithitontheclientside.
www.it-ebooks.info
Chapter4.ClientAPIAsearchapplicationneedstointeractwithSolrbyissuingindexandsearchrequests.AlthoughSolrexposestheseservicesthroughHTTP,workingatthat(low)levelisnotsoeasyforadeveloper.ClientAPIsarefaçadelibrariesthathidethelow-leveldetailsofclient-servercommunication.TheyallowustointeractwithSolrusingclient-nativeconstructsandstructuressuchastheso-calledPlainOldJavaObject(POJO)intheJavaprogramminglanguage.
InthischapterwewilldescribeSolrj,theofficialSolrclientJavalibrary.Wewillalsodescribethestructureandthemainclassesinvolvedinindexandsearchoperations.Thechapterwillcoverthefollowingtopics:
Solrj:theofficialJavaclientlibraryOtheravailablebindings
www.it-ebooks.info
SolrjSolrjisthenameoftheofficialSolrJavaclient.Itcompletelyabstractstheunderlying(HTTP)transportlayerandoffersasimpleinterfacetoclientapplicationstointeractwithSolr.
www.it-ebooks.info
SolrServer–theSolrfaçadeAclientlibrarynecessarilyneedsafaçadeoraproxy,thatis,anobjectrepresentingtheremoteresourcethathidesandabstractsthelow-leveldetailsofclient-serverinteraction.InSolrj,thisroleisplayedbyclassesthatimplementtheorg.apache.solr.client.solrj.SolrServerabstractclass.Atthetimeofwritingthisbook,thesearetheavailableSolrServerimplementers:
EmbeddedSolrServer:ThisconnectstoalocalSolrCorewithoutrequiringanHTTPconnection.Thisisnotrecommendedinproductionbutisdefinitelyusefulforunittestsanddevelopment.HttpSolrServer:ThisisaproxythatconnectstoaremoteSolrusinganHTTPconnection.LBHttpSolrServer:AproxythatwrapsmultipleHttpSolrServerinstancesandimplementsclient-side,round-robinloadbalancingbetweenthem.Italsoensuresitperiodicallychecksthe(running)stateofeachserver,eventuallyremovingoraddingmemberstotheround-robinlist.ConcurrentUpdateSolrServer:Thisisaproxythatusesanasynchronousqueuetobufferinputdata(thatis,documents).Onceagivenbufferthresholdisreached,dataissenttoSolrusingaconfigurablenumberofdequeuerthreads.CloudSolrServer:AproxyusedtocommunicatewithSolrCloud.
AlthoughanySolrServerimplementersmentionedpreviouslyofferthesamefunctionalities,HttpSolrServerandLBHttpSolrServerarebettersuitedforissuingqueries,whileConcurrentUpdateSolrServerisrecommendedforupdaterequests.
TipThetestcase,org.gazzax.labs.solr.ase.ch3.index.SolrServersITCase,containsseveralmethodsthatdemonstratehowtoindexdatausingdifferenttypesofservers.
www.it-ebooks.info
InputandoutputdatatransferobjectsAsdescribedinthepreviouschapters,aDocumentisacentralconceptinSolr.Itrepresentsanatomicunitofinformationexchangedbetweentheclientandtheserver.TheSolrAPIseparatesinputdocumentsfromoutputdocumentsusingtheSolrInputDocumentandSolrDocumentclasses,respectively.
Althoughtheysharebasicdatatransferobjectbehavior,eachofthemhasitsownspecificfeaturesassociatedwiththedirectionofinteractionbetweentheclientandtheserverwheretheyaresupposedtoplay.
SolrInputDocumentisawriteobject.Youcanadd,change,andremovefieldsinit.Youcanalsosetaname,value,andoptionalboostforeachofthem:
publicvoidaddField(Stringname,Objectvalue)
publicvoidaddField(Stringname,Objectvalue,floatboost)
publicvoidsetField(Stringname,Objectvalue)
publicvoidsetField(Stringname,Objectvalue,floatboost)
SolrDocumentistheoutputdatatransferobject,anditisprimarilyintendedasaqueryresultholder.Here,youcangetfieldvalues,fieldnames,andsoon:
publicObjectgetFieldValue(Stringname)
publicCollection<Object>getFieldValues(Stringname)
publicObjectgetFirstValue(Stringname)
WithinanUpdateRequestProcessorinstance,orwhileaddingdatatoSolr,wewilluseSolrInputDocumentinstances.InQueryResponse(thatis,theresultofaqueryexecution),wewillfindSolrDocumentinstances.
TipAlltheexamplesinthesampleprojectassociatedwiththischaptermakeextensiveuseofthesedatatransferobjects.
www.it-ebooks.info
AddsanddeletesOnceavalidreferenceofaSolrServerhasbeencreated,addingdatatoSolrisveryeasy.TheSolrServerinterfacedefinesseveralmethodstodothis:
voidadd(SolrInputDocumentdocument)
voidadd(List<SolrInputDocument>document)
SowefirstcreateoneormoreSolrInputDocumentinstancesfilledwiththeappropriatedata:
finalSolrInputDocumentdoc1=newSolrInputDocument();
doc1.setField("id",1234);
doc1.setField("title","DelicateSoundofThunder");
doc1.addField("genre","Rock");
doc1.addField("genre","ProgressiveRock");
Then,usingtheproxyinstance,wecanaddthatdata:
solrServer.add(doc1);
Finally,wecancommit:
solrServer.commit();
Wecanalsoaccumulateallthedocumentswithinalistandusethatastheargumentoftheaddmethod.
FollowingthesamelogicasdescribedinthesecondchapterforRESTservices,SolrServerprovidesthefollowingmethodstodeletedocuments:
UpdateResponsedeleteById(Stringid)
UpdateResponsedeleteById(Stringid,intcommitWithinMs)
UpdateResponsedeleteById(List<String>ids)
UpdateResponsedeleteById(List<String>ids,intcommitWithinMs)
UpdateResponsedeleteByQuery(Stringquery)
UpdateResponsedeleteByQuery(Stringquery,intcommitWithinMs)
TipTheorg.gazzax.labs.solr.ase.ch3.index.SolrServersITCasetestcasecontainsseveralmethodsthatillustratehowtoindexanddeletedata.
www.it-ebooks.info
SearchSearchingwithSolrjrequiresknowledgeof(mainly)twoclasses:org.apache.solr.client.solrj.SolrQueryandorg.apache.solr.client.solrj.response.QueryResponse.ThefirstisanobjectrepresentationofaquerythatcanbesenttoSolr.Itallowsustoinjectallparameterswedescribedinthepreviouschapter.Onewayofdoingthisisbyprovidingdedicatedmethods,suchasthese:
SolrQuerysetQuery(Stringquery)
SolrQuerysetRequestHandler(Stringqt)
SolrQueryaddSort(Stringfield,ORDERorder)
SolrQuerysetStart(Integerstart)
SolrQuerysetFacet(booleanb)
SolrQueryaddFacetField(String…fields)
SolrQuerysetHighlight(booleanb)
SolrQuerysetHighlightSnippets(intnum)
…
Alternatively,genericsettermethodscanbeprovided:
SolrQuerysetParam(Stringname,String…values)
SolrQuerysetParam(Stringname,booleanvalue)
NotethatalltheprecedingmethodsreturnthesameSolrQueryobject,thusallowingacallertochainmethodcalls,likethis:
SolrQueryquery=newSolrQuery()
.setQuery("CharlesMingus")
.setFacet(true)
.addFacetField("genre")
.addSort("title",Order.ASC)
.addSort("released",Order.DESC)
.setHighlighting(true);
OnceaSolrQueryhasbeenbuilt,wecanusetheappropriatemethodintheSolrServerproxytosendthequeryrequest:
QueryResponsequery(SolrParamsparams)
ThemethodreturnsaQueryResponse,whichisanobjectrepresentationoftheresponsethatSolrsentbackasaresultofthequeryexecution.Withthatobject,wecangetthelistofSolrDocumentsofthecurrentlyreturnedpage.Wecanalsogetfacetsandtheirvalues,andingeneral,wecaninspectandaccessanypartoftheresponse.
TipTheorg.gazzax.labs.solr.ase.ch3.search.SearchITCasetestcasecontainsseveralexamplesthatdemonstratehowtoquerywithSolrj.
ThefollowingisanexampleoftheuseofQueryResponse:
//Executesaqueryandgetthecorrespondingresponse
QueryResponseres=solrServer.query(aQuery);
www.it-ebooks.info
//Getstherequestexecutionelapsedtime
longelapsedTime=res.getElapsedTime();
//Getstheresults(i.e.apageofresults)
SolrDocumentListresults=res.getResults();
//Howmanytotalhitsforthisresponse
inttotalHits=results.getNumFound();
//Iteratesoverthecurrentpage
for(SolrDocumentdocument:results){
//Dosomethingwiththecurrentdocument
Stringtitle=document.getFieldValue("title");
…
}
//Getsthefacetfield"genre"
FacetFieldff=res.getFacetField("genre");
//Iterateoverthefacetvalues
for(Countcount:genre.getValues()){
Stringname=count.getName();//e.g.Jazz
Stringcount=count.getCount();//e.g.19
}
//TheHighlightingsectionisabitcomplicated,asthe
//valueobjectisacompositemapwherekeysarethedocumentsidentifiers
whilevaluesaremapswithhighlightedfieldsaskeyandsnippets(alist
ofsnippets)asvalues.
Map<String,Map<String,List<String>>>hl=
response.getHighlighting();
//Iteratesoverhighlightingsectio
for(Entry<String,Map<String,List<String>>docEntry:hl){
StringdocId=docEntry.getKey();
//Iteratesoverhighlightedfields
for(Entry<String,List<String>fEntry:entry.getValue()){
StringfEntry=field.getKey();
//Iteratesoversnippets
for(Stringsnippet:field.getValue()){
//Dosomethingwiththesnippet
}
}
www.it-ebooks.info
OtherbindingsSolrjisaverypowerfulclientAPI,butofcourse,itisonlyavailableforJavaclients.SinceSolrservicesareexposedusingstandardHTTPprocedures,otherclientAPIimplementationshavebeencreatedforotherlanguages.Hence,itispossibletointeractwithSolrusingPython,Perl,Ruby,.NET,oryourfavoriteprogramminglanguage.
Thefollowingtablelistssomeofthem,togetherwiththeirlocation(onlySolrjisapartoftheSolrdistribution;allotherclientlibrariesareindependentprojects):
Project Language Address
sunburnt Python https://pypi.python.org/pypi/sunburnt
pysolr Python https://pypi.python.org/pypi/pysolr/3.2.0
solrcloudpy Python https://pypi.python.org/pypi/solrcloudpy
solr-ruby Ruby https://github.com/erikhatcher/solr-ruby-flare/tree/master/solr-ruby
Blacklight Ruby http://projectblacklight.org
Solarium PHP http://www.solarium-project.org/
Solr-PHP-UI PHP http://www.opensemanticsearch.org/solr-php-ui/
PECL/Solr PHP http://pecl.php.net/package/solr
Flux Clojure https://github.com/mwmitchell/flux
solr-scala-client Scala https://github.com/takezoe/solr-scala-client
SolrNet .NET https://github.com/mausch/SolrNet
Acompleteandupdatedlistofallbindingsisavailableathttps://wiki.apache.org/solr/IntegratingSolr.
www.it-ebooks.info
SummaryAdistributedsearchsystem,suchasSolr,requiresremoteserviceinvocationstosendandreceivedataacrossanetwork.ClientswithoutappropriateAPIswillbeexposedtothecomplexityofdealingwithlow-leveldetailsofthecommunicationprotocol.
SinceSolrprovidesallcoreservicesthroughHTTP,alotofclientlibrarieshavebeendevelopedtohidethatcomplexity.Regardlessoftheconcretebinding,aclientlibraryencapsulatesthelow-leveldetailsofclient-servercommunicationandprovidesauniformserviceinterfaceforclients.
Inthischapter,wefocusedontheSolrclientAPIs,specificallyontheofficialJavabindingcalledSolrj,itsmainfeatures,andthemainclassesinvolvedinindexandqueryoperations.
WebrieflydescribedandlistedsomeotherpopularbindingsthathavebeendevelopedontopoftheSolrHTTPservices.
Inthenextchapter,wewillreturntotheserversidetodescribehowtofine-tuneandmanageaSolrinstance.
www.it-ebooks.info
Chapter5.AdministeringandTuningSolrYoucanmanageaSolrinstallationusinganyoftheseveralsystemadministrationtoolsprovidedwithSolr.ThesystemadministrationtoolsincludetheAdministrationConsole,theRESTservices,andtheJMXAPI,withwhichyoumanageandmonitorcores,hardwareresources,runtimeconfiguration,andthehealthoftheSolrenvironmenttoensuremaximumavailabilityandperformance.
Althoughthetopicofadministrationisusuallyoutsidethescopeofadevelopersphere,mostprobablyyou,asaproviderofasolutionbasedonSolr,willneedtoknowsomethingaboutit.Specifically,youneedtoknowaboutasetoftoolsthatletyoumonitorSolr,tuneit,andinvestigatetroubles.
Throughoutthischapter,wewilluseaSolrinstancepreloadedwithsampledata.Inordertohavethatupandrunning,youshouldcheckoutthesourcecodeofthebook,gotothech5folder,andrunthis(usingEclipseorfromthecommand-line):
#mvncleaninstallcargo:run
TipThech5sampleprojecthasapreconfiguredEclipselauncherusedtorunSolr.Youcanfinditunderthesrc/dev/eclipsefolder.Justright-clickonstart-ch5-server.launchandselecttheDebugasmenuitem.
ThischapterwilldescribethemostrelevantsectionsoftheSolradministrationconsole.WewillalsoexploretheJMXAPI.Eachtimeahardwareresourceisinvolved,wewilltalkaboutit.Specifically,thischapterwillcoverthefollowingtopics:
TheSolrAdministrationConsoleUsageofhardwareresourcesJConsoleandJMX
www.it-ebooks.info
DashboardTheAdministrationConsoleisawebapplicationthatispartofSolr.YoucanaccesstheAdministrationConsolefromanymachineonthelocalnetworkthatcancommunicatewithSolr,throughawebbrowser.
Typehttp://127.0.0.1:8983/solronthewebbrowser’saddressbar.Thefirstpagethatappearsisthedashboard,asshowninthefollowingscreenshot:
ThisiswhereyoucanseegeneralinformationaboutSolr(forexample,theversion,startuptime,andsoon)andaboutitshostingenvironment(forexample,JVMversion,JVMargs,processors,physicalandJVMmemory,andfiledescriptors).
www.it-ebooks.info
PhysicalandJVMmemoryThefirstandthelastgraybarsontherightsideofthedashboardrepresentthephysicalandJVMmemory,respectively.Thefirstmeasureistheamountofthememorythatisavailableinthehostingmachine.ThesecondmeasureistheamountassignedtotheJVMatstartuptimebymeansofthe–Xmsand–Xmxoptions.
TipForacompletelistofavailableJVMoptions,seehttps://docs.oracle.com/cd/E22289_01/html/821-1274/configuring-the-default-jvm-and-java-arguments.html.
Eachbarreportsboththeavailableamountandusedamountofmemory.Asyoucanimagine,memoryisoneofthecrucialfactorsconcerningSolrperformanceandresponsetimes.
Whenwethinkaboutawebapplication,wemayconsideritasastandalonecontainerthat,forexample,readsdatafromanexternaldatabaseandshowssomedynamicpagestotheendusers.Solrisnotlikethat;itisaservice.Despiteitsweb-application-likenature,itmakesextensiveuseoflocalhardwareresourcessuchasdiskandmemory.
Memory(here,I’mreferringtotheJVMmemory)isusedbySolrforalotofthings(forexample,caches,sorting,faceting,andindexing)sounderstandingallthosemechanismsiscrucialtodeterminetherightamountofmemoryoneshouldassigntotheJVM.
NoteThere’sausefulspreadsheet(althoughwealreadymentionedthisinthefirstchapter)thatyoucanfindintheSolrsourcerepositoryathttps://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls.ItisagoodstartingpointfromwhichtoestimateRAManddiskspacerequirements.
However,aresourcethatisoftenconsideredasexternaltotheSolrdomainisthesystemmemory,thatis,theremainingmemoryavailablefortheoperatingsystemoncetheJVMmemoryhasbeendeducted.
Inanoptimalsituation,thatkindofmemoryshouldbeenoughto:
Lettheoperatingsystemmanageitsresources.AccommodatetheSolrindex.Ideally,ifitisabletocontainthewholeindex,therewon’tbeanydiskseek.
Thefirstpointisquiteobvious;anoperatingsystemneedsagivenamountofmemorytomanageitsordinarytasks.
Thesecondpointhastodowiththeso-called(OS)filesystemcache.TheJVMworksdirectlywiththememorythatwemadeavailableinthestartupcommand-linebymeansofthe–Xmsand–Xmxoptions.ThisisthememoryweareusinginourJavaapplicationtoloadobjectinstances,implementapplication-levelcaches,andsoon.
www.it-ebooks.info
However,applicationssuchasSolrthatwidelyusefilesystemresources(toloadandwriteindexfiles)alsorelyonanotherimportantpartofthememorythatisavailablefortheoperatingsystemandisusedtocachefiles.Onceafileisloaded,itscontentiskeptinmemoryuntilthesystemrequiresthatspaceforotherpurposes.Datainthisfilesystemcacheprovidesquickaccess,withoutrequiringdiskaccessesandseeks.
NoteRememberthatthistypeofmemoryhasnothingtodowiththememoryassignedtotheJVM.
Asyoucanimagine,thisaspectcandramaticallyimproveoverallperformanceinbothindex(writes)andquery(reads)phases.Inthosecaseswhereit’snotpossibletofitalloftheindexinthefilesystemcache(theindexcaneasilyreachasizethatisrelativelysmallintermsofdiskspacebutdefinitelyhugeintermsofmemory),thesystemmemoryshouldbeenoughtoallowefficientloadandunloadmanagementofthatfilesystemcache.
www.it-ebooks.info
DiskusageThedashboardpagereportsinformationabouttheswapspace,butitsaysnothingaboutdiskusage.Thisisbecausethatkindofinformationisreportedinadedicatedsectionforeverymanagedcore.Unfortunately,thereisn’tacentralpointwhereit’spossibletoseethetotaldiskspaceusedbytheinstance.
Asdescribedintheprevioussection,thediskisaresourcewidelyusedbySolr,anditsroleisfundamentalforgettingoptimalperformance.Here,wecanaddadditionalinformationbymentioningSolidStateDisks(SSD),whichareusuallyaverygoodchoiceforgettingfastreadsandwrites.Butagain,themostcriticalfactorisunderstandingandtuningthefilesystemcache;inthemostextremecases,thisentirelyavoidsdiskseeksatall.ToputitinanutshellSSDsarefast,butmemoryisbetter.
www.it-ebooks.info
FiledescriptorsThethirdbar(showninthepreviousscreenshot)showsthemaximumnumber(lightgray)andtheeffectiveopened(darkgray)filedescriptorsassociatedwiththeJavaprocessthatrunsSolr(thatis,theJavaprocessofyourservletcontainer).
ASolrindexcanbecomposedofalotoffilesthatneedtobeopenedatleastonce.Especiallyifyouhavemanycores,frequentchanges,commits,andoptimizes,theincrementalnatureofaSolrindexcanleadtoexhaustionofalltheavailablefiledescriptors.ThisisusuallythecasewhereyougetanIOException(toomanyopenfiles).
ThefirstplacewhereyoucanmanageandlimitthenumberoffilesusedbySolrisSolritself.Withinthesolrconfig.xmlfile,you’llfinda<mergeFactor>parameterinthe<indexConfig>section.Thisparameterdecideshowmanysegmentswillbemergedatatime.
TheSolr/Luceneindexiscomposedofmultiplesubindexescalledsegments.Eachsegmentisanindependentindexcomposedofseveralfiles.Whendocumentsareadded,updated,ordeleted,Solrasynchronouslypersiststhosechangesbycreatingnewsegmentsormergingexistingsegments.Thisisthereasonthetotalnumberoffilescompoundingtheindexwillnecessarilychange(itchangesgradually,followingareasonableamountofchangesappliedtoyourdataset).Hence,itneedstobemonitored.
WithamergeFactorvaluesetto10(thedefaultvalue)therewillbenomorethanninesegmentsatagivenmoment.Whenupdatethresholds(themaxBufferedDocsorramBufferSizeparameters)arereached,anewsegmentwillbecreated.IfthetotalnumberofsegmentsisequaltotheconfiguredmergeFactor,Solrwillattempttomergeallexistingsegmentsintoanewsegment.
Anotherparameterinthesolrconfig.xmlfilethathasanimpactonthenumberofopenfilesis<useCompoundFile>.Ifthisissettotrue(notethatitdefaultstofalse),Solrwillcombinethefilesthatmakeupasegmentintoasinglefile.Whilethatmayproduceabenefitintermsofopenfiledescriptors,itmayalsoleadtosomeperformanceissuesbecauseofthemonolithicnatureofthecompoundfile.
Ontopofthat,therearescenarioswherealotoffilesarethenaturalconsequenceofyourinfrastructure.Thinkofasystemwithseveralcores,forexample.Theprevioussettingsarespecifictoasinglecore,butwhatifyouhavealotofthem?
TipWhenIuseSolrforlibrarysearchservices,Iusuallycreateatleastsixcores:oneforthemainindex,onethatholdstheheadingsusedfortheautocompletionfeature,andoneforeachalphabeticalindex(forexample,authors,titles,subjects,andpublishers).Therearesomecustomerswhorequireupto50alphabeticalindexes(whichmeansupto50cores).
Insuchcases,aftercheckingoutyourapplicationandseeingthatiteffectivelyrequiresmorefiledescriptorsthanthedefault(usually1024),youmaywanttoincreasethatlimitbyusingtheulimitcommand,asfollows:
www.it-ebooks.info
#ulimit–n5000
Here,5000isthenewlimit.Notethatthiscommandrequiresrootprivilegesanditappliesthatlimitonlytothecurrentsession.Ifyouwantittobepermanent,thatvaluehastobeconfiguredinthe/etc/security/limits.confconfigurationfile.
www.it-ebooks.info
LoggingTheAdministrationConsoleallowsyoutoseelogmessages(alsoavailableinalogfile)andchangethelogsettings.
Whilethefirstfeatureisusefulonlyifyoudon’thaveaccesstothelogfiles(inspectinglogfileswithUnixcommand-linetoolsisdefinitelymorepowerfulthandoingthesamewiththeAJAX-refreshedpage),managinglogsettingsisveryusefulbecauseitdoesn’trequiremanualeditsorserverrestarts.So,ifyouwanttolimitthepriorityleveloflogmessageson-the-fly,ordebugthebehaviorofacomponent,thisistherightplacetodoso.
TipAverboseloglevelcanslowdownindexoperations,soit’sbettertochecklogsettingsbeforecallingthe/updaterequesthandler.Forthesamereason,rememberthatSolrlogsallqueryrequestsattheINFOlevel.Dependingonhowmanyusersyourapplicationhas,thiscouldleadtoahugeamountoflogmessages.
www.it-ebooks.info
CoreAdminTheCoreAdminsectionisacentralpointwhereyoucanmanageregisteredcores.Youcancreateanewcoreon-the-fly(assumingthatthecoreinstanceanddatadirectoriesexistonthedisk)ormanagetheexistingcoresonebyone,selectingthemfromthelistontheleft.ThefollowingscreenshotshowstheCoreAdminpageoftheSolrinstancesetupforthischapter:
Thetoptoolbarcontainsthesebuttons:
Button Description
Unload Unloadsthecore.Thecorewillberemovedafterpendingrequestsareprocessed.
Rename Changesthecorename.NotethatthischangewillaffecttheURIendpointsofthecoreservices.
Swap Swapstwoactivecores.Thisisusefulforswitchingbetweentwoversions(thatis,onlineandofflineversions)ofthesamecore.Notethatbothofthemwillstillbealiveafterissuingtheswapcommand.
ReloadReloadsacore.Thecurrentcoreinstancewillbeavailableonlyforsatisfyingpendingrequests.Thiscommandisusefulifsome(backward-compatible)changeshavebeenmadetothesolrconfig.xmlorschema.xmlconfigurationfilesorcorelibrariesandyouwanttoloadthosechanges.
www.it-ebooks.info
Optimize Issuesanoptimizecommandtotheselectedcore.
Thecentralareashowsthefollowinginformationaboutthecoreandthecorrespondingindex:
Attribute Description
startTime Thecorestart(orreload)time.
instanceDir Thetopcorefolder.ItcontainsaconfsubfolderthatcontainsSolrconfigurationfiles(schema.xml,solrconfig.xml,anddependentfiles).
dataDir Thefoldercontainingtheindexdatafiles.
lastModified Thelastmodificationdateoftheindex.
version AversionnumberassignedtotheIndexReaderinstanceassociatedwiththeindex.
numDocs Thenumberofsearchabledocumentsintheindex.Inotherwords,thisisthenumberofdocumentsyoucangetbackfroma*:*query.
maxDocsThenumberofinternaldocumentidentifiersactuallyinuse.ThedifferencebetweenmaxDocsandnumDocsindicateshowmanydocumentshavebeendeletedorreplaced.Theold(deletedandreplaced)identifiersaregraduallyremovedduringmergesorafterissuinganindexoptimize.
deletedDocsThenumberofdeleteddocuments.ItalsoincludesreplaceddocumentsbecauseSolrdoesn’tactuallysupportupdates;itsimplydeletesagivendocumentandsubsequentlyaddsitsnewversion.ThisisbasicallythedifferencebetweenmaxDocsandnumDocsafteracommitandbeforemergingoroptimizing.
optimized Indicateswhethertheindexhasbeenoptimized.
current Indicateswhethertheindexhasbeencommitted.
directory TheunderlyingLuceneDirectoryimplementation.
www.it-ebooks.info
JavapropertiesandthreaddumpJavapropertiesformaread-onlysectionwhereyoucanseethesystempropertiesassociatedwiththecurrentJVMinstance.
TipRememberthatyoucanusethosevariablesinsolrconfig.xml,soyoumaywanttocheckinthispagewhetheraspecificpropertyhastheexpectedvalue.
ThethreaddumppageshowsasnapshotofwhatlivethreadsintheJVMaredoingatagiveninstant.Thesameinformationcanberetrievedusingthejstackcommand-lineutilityavailableinJVM.
TipThreaddumpsareveryusefulfordebugginghigh-CPU-usagescenariosanddeadlocks.
Unlikeloganalysis,theuserinterfacehereisdefinitelymoreuser-friendlythanmanualinspectionofthejstackoutput.
www.it-ebooks.info
CoreoverviewSelectingoneoftheavailablecoresinthedrop-downlistontheleftsideoftheAdministrationConsolewillopenacorededicatedarea,withseveralothersections.Thefirstsectionisanoverviewoftheselectedcore.Itreportsmoreorlessthesameinformationthatwesawinthedashboardandinthecoreadminpage.
Here,thereisadditionalinformationaboutthehealthcheck(heartbeatinformationenabledonlyifyouconfiguredthepingrequesthandler)andthereplicationstatus.
Thereplicationsectionshowstheindexstatusofthemasterandslave(onlyifthecurrentSolrinstanceactsasaslave)intermsofreplicability.
TipThereplicationsectionisusefulformonitoringmaster-repeater-slaveinstances,especiallywhenyougetsomesynchronizationissueswithintheSolrensemble.NotethattheconsolealsohasadedicatedReplicationsectionwherethatinformationismoredetailed.
Themaster-slavereplicationarchitectureisexplainedinthenextchapter.
www.it-ebooks.info
CachesTospeedupqueryexecution,Solrstoresdatausingseveraltypesofin-memorycaches.Cachestransparentlystorefilters,documents,andidentifierssothatfuturerequestsforthesamedatacanbeservedfaster.Ifyourunthesamesearchtwice,youwillseeintheSolrlogsamarkeddifferencebetweenthefirstandthesecondqueryintermsofresponsetime,asshowninthefollowingexample:
…params={q=history&fq=catalog:NRA}hits=17298status=0QTime=78
…
…params={q=history&fq=catalog:NRA}hits=17298status=0QTime=2
Solrcomeswithseveralkindsofcaches.Theycanbeconfiguredandtunedinsolrconfig.xml:
<filterCacheclass="solr.FastLRUCache"size="512"initialSize="512"
autowarmCount="0"/>
<queryResultCacheclass="solr.LRUCache"size="512"initialSize="512"
autowarmCount="0"/>
<documentCacheclass="solr.LRUCache"size="512"initialSize="512"
autowarmCount="0"/>
<fieldValueCacheclass="solr.FastLRUCache"size="512"autowarmCount="128"
showItems="32"/>
ThefollowingtablebrieflydescribesthetypesofcachesavailableinSolr:
Cache Description
FilterCache Holdsthedocumentidentifiersassociatedwithfilterqueriesthathavebeenexecuted.
QueryResultCache Holdsthedocumentidentifiersresultingfromqueriesthathavebeenexecuted.
DocumentCache HoldsLucenedocumentinstancesforquickaccesstotheirstoredfields.
FieldCacheAlow-levelLucenefieldcachethatisnotmanagedbySolr(inotherwords,itcannotbeconfigured).Itisusedforsortingandfaceting.
FieldValueCacheThisisafieldcacheverysimilartoFieldCache,butitcanbeconfigured.Itismainlyusedforfaceting.
CustomCache Application-levelcachesusedtoholdcustomuser/applicationdata.
www.it-ebooks.info
CachelifecyclesAcacheisalwaysassociatedwithanindexsearcherinstance,anditfollowsthesamelifecycleofthatinstance.Thismeansthat,whenanindexsearcherisinstantiated(onstartuporafteracommit),cacheinstancesarecreatedandassociatedwithit.Asaconsequenceofthis,cachesandcachedobjectsdon’thaveanexpirytime;theywillbevalidaslongastheowningindexsearcherinstanceisactive.
Whenasearcherisinstantiated,andifitisnotthefirstsearcher(thatis,atstartuptime),cachescanbeoptionallyauto-warmed;thatis,theycanbeprepopulatedwithsomedatacomingfromtheirpreviouscolleagues(cachesfromtheprevioussearcher).Theautowarmcountattributeallowsustodeclarethemaximumamountofdata(absoluteorapercentage)thatcanbeusedtoprepopulatethenewcache.
NoteDatafromthepreviouscacheisnottakenasitis.Ithastobevalidatedagainstthenewsearcher“view”oftheindex.Agivenobjectpreviouslycachedcan’tbevalidafterthenewsearcherhasbeenopened;itcouldhavebeendeleted.Theautowarmcountattributerefersonlytovalidentries.
Whenanewsearcherisopened,thecurrentsearcherwillcontinuetoservependingrequests.Afterthat,itwillbeclosedandtheorphancacheswillbesubjectedtogarbagecollection.
www.it-ebooks.info
CachesizingCachesizecanrefertotwodifferentmeasures:thetotalcountofobjectsacachecontainsataspecificmoment,andthemaximumnumberofobjectsacachecanhold.
Withinsolrconfig.xml,youcanconfiguretheminimum(initial)andmaximumsizeofacachebymeansoftheinitialSizeandsizeattributes,respectively:
<FilterCache…class="…"size="512"initialSize="512"/>
TheinitialSizeattributeisusedwhenthecacheinstanceiscreated.Itpreallocatesagivennumberofseatsforobjectsthatwillbecached.
Theidealdimensionofacachestrictlydependsontheapplication.Erroneously,onecouldthink:thebigger,thebetter,butthisisahalftruth;ahugecachewouldhavetheadvantageofholdingalltherequiredstructuresinmemory,thusallowingfastaccesstothatinformation.However,unlessyourindexiscompletelystaticanditneverchanges,youwillsoonerorlateradd,update,orremovesomething,andyouwillneedtocommitthosechanges.Acommitwillopenanewsearcher,whichinturnwillcreatenewcaches,andthe(old)hugecacheswillbediscarded.
Inthissituation,thegarbagecollectorwillhavealotofworktodoreclaimingallobjectsfromtheoldcaches.Worse,ifyouhaveconfiguredauto-warming,theprepopulationofthenewlycreatedcachescouldtakealotoftime.
Inotherwords,thisscenariorequiresalotofmemorytomanageallofthoseobjects.Frommyexperience,Icantellyouthatthisisoneofthecommonwaysofgetting“OutOfMemory”errormessages.Rememberthatgarbagecollectionisnotunderyourcontrol,somostprobablytherewillbeagivenintervaloftimeduringwhichtheJVMmustholdbothnewandoldobjectreferences.
Thesuggestionhereistostartwithdefaultsizes,andthenusetheSolrAdministrationConsoletoconstantlymonitorhowthingsmove.Cachemanagementisnotado-once-and-forgettask.Cachesmustbeperiodicallymonitoredandeventuallytunedinordertogainoptimaladvantageforyourapplication.
www.it-ebooks.info
CachedobjectlifecycleTheclassattributeofacachedeterminesprimarilyitsimplementation,butmostimportantly,itdefineshowobjectsaremanagedwithinthecache.Inotherwords,itimplementsthelogicneededtoknowwhattodowhenthecachereachesitsmaximumsizeandwhichobjectsmustbeevictedwhenanewentryarrives.
Solroffersthreecacheimplementations:
LRUCache:Oncethemaximumsizeofthecachehasbeenreachedandanewobjectneedstobecached,thisimplementationwillremovetheoldestentry.Theageofanobjectisdeterminedbythelasttimeitwasrequestedfromthecache.FastLRUCache:ThisimplementsbehaviorsimilartoLRUCachebutusesaseparatethreadto(asynchronously)cleanuptheoldestentries.LFUCache:Thispolicyimplementsanevictionbasedonthepopularityofeachobjectinthecache(thatis,howmanytimesagivenobjectinthecachehasbeenrequested).
www.it-ebooks.info
CachestatsForeachcache,theAdministrationConsolereports(Plugin/Stats|Cache)thefollowingattributes:
Attribute Description
lookups Thetotalcountoflookuprequests.
hits Thenumberofrequeststhatsuccessfullyfoundtherequestedobject.
hitratioThenumberofhitsontopofthetotalnumberofrequests.Avalueof1representsoptimalusageofthecache(everyrequestedobjecthasbeenfoundinthecache).
inserts Thetotalnumberofinsertedobjects.
evictions Thetotalnumberofevictions(objectsremoved).
size Thecurrentsizeofthecache.
warmupTime Thetimeneededtoauto-warmthecache.
cumulative_lookups
cumulative_hits
cumulative_hitratio
cumulative_inserts
cumulative_evictions
Acacheinstancedieswhentheassociatedsearcherisdiscarded.Thecumulativeattributesretainlookups,hits,hitratio,inserts,andevictionsamongallcacheinstances(ofthesametype),sothevalueofthoseattributesmeasuresthesamethingswejustsawbutcumulatively,sinceSolrstartup.
www.it-ebooks.info
TypesofcacheAswehavebrieflydescribed,Solrcomeswithseveralkindsofcaches.Thefollowingparagraphsdescribethemfurther.
FiltercacheEachtimeafilterqueryisexecuted,Solrplacesanewentryinafiltercache.Afiltercacheisakindofmapwherethekeyisrepresentedbythefilterquerystring(forexample,catalog:NRAorgenre:Jazz)andtheentryisalistofallmatchingdocumentidentifiers.
Thefiltercacheisconfiguredinthesolrconfig.xmlfile,inthefollowingfragment:
<filterCacheclass="solr.FastLRUCache"size="512"initialSize="512"
autowarmCount="0"/>
Filterqueriesplayacrucialroleinperformanceandresponsetimeoptimization.Thecachedidentifierscanbeusedandreusedwithsubsequentqueries;briefly,requeststhatcontaincachedfilterquerieswillimproveoverallperformancebecausethosequerieswon’tbeactuallyexecutedagain.
Auto-warmingafiltercachemeansrefreshingeverycachedfilterqueryresultbyexecuting(again)allofthosequeriesagainsttheindexviewrepresentedbythenewsearcher.Let’sseethiswithaconcreteexample;thesampleSolrinstancecontains24albums.Atstartuptime,thefiltercacheisempty.Nowlet’ssupposethefollowingqueriesareexecuted:
http://127.0.0.1:8983/solr/example/query?q=*:*&fq=genre:Jazz(3results)
http://127.0.0.1:8983/solr/example/query?q=*:*&fq=genre:Fusion(4results)
http://127.0.0.1:8983/solr/example/query?q=*:*&fq=released:1986(2results)
Thethreefilterqueriespopulatethefiltercacheasdescribedinthefollowingtable:
Cacheentries(filterqueries) Queryresults(Documentidentifiers)
genre:Jazz 1,2,3
genre:Fusion 4,5,6,7
released:1986 6,8
Nowwedecidetoremovedocument#6.Inordertodothis,wesendadeletecommandandthenacommitcommand.Oncethechangehasbeencommitted,document#6nolongerexists.Anewsearcherisopened,andthecachecontentneedstoberefreshedbecauseitstillcontainsaninvalidentry.So,theauto-warmingprocesssimplyrepeatseachfilterqueryinthecache(genre:Jazz,genre:Fusionandreleased:1986inthiscase)andrefreshesthecontentwithvalidqueryresults.Aftertheauto-warming,thefiltercachewillhavethefollowingcontent:
Cacheentries(filterqueries) Queryresults(Documentidentifiers)
www.it-ebooks.info
genre:Jazz 1,2,3
genre:Fusion 4,5,7
released:1986 8
Thisre-executionisingeneralthecostofauto-warming,whichisdirectlyconnectedwiththecachesize(ahugecacheinmostcaseswilltakesometimetore-executeallcachedqueries).
QueryResultcacheWiththiskindofcache,eachtimeaqueryisexecuted,itsresults(intermsofmatchingdocumentidentifiers)arecachedforfuturereuse.Thisisconfiguredinthefollowingfragmentofthesolrconfig.xmlfile:
<queryResultCacheclass="solr.FastLRUCache"size="512"initialSize="512"
autowarmCount="0"/>
Theunderlyingreasonisthatpopularqueries(thatis,queriesthatareoftenrepeated)willgainaclearadvantageherebecausetheywon’tbeactuallyexecutedagain—theirresultsarealreadycomputed.
NoteOtherthanpopularqueries,paginationmechanismsalsobenefitfromthiscache.Whentheuserasksforthenextorthepreviouspageofresultsforagivenqueryexecution,Solrwillrepeatthequerybutwithadifferentstartparameter.
DocumentcacheBothFilterCacheandQueryResultCachestoredocumentidentifiers.So,ontopofagivenquery,Solrcomputesthematchingidentifiers;foreachofthem,itneedstoquerytheindextoretrieveitsstoredfields.Afterthat,theresponseispopulatedwiththosedocumentsandtheircorresponding(stored)fields.
DocumentCachecachesLucenedocuments,soonceaqueryhasbeenexecuted,Solrdoesn’tneed(withregardtodocumentsthatarefoundinthiscache)toquerytheindextopopulatethelistofresults.
TipIfyouhavehugestoredfields(forexample,full-textfieldsusedforhighlighting),beawarethatyoucannotspecifywhichfieldsmustbeinthecache.Therefore,hugefieldsmayrequirealotofmemory.
FieldvaluecacheThefieldvaluecachehasamapstructurewherekeysarefieldnamesandvaluesareuninvertedfields.Thisstructuremapsdocumentidentifierswithvalues.Ifitisnotexplicitlydeclared,thiscacheisautomaticallygeneratedwithaninitialsizeof10,amaximumsizeof10000,andnoauto-warming.Itisprimarilyusedforfaceting.
www.it-ebooks.info
CustomcacheCustomcachesareintendedfordeveloperswhowritetheirownSolrextensions.Unliketheothertypes,customcachesacceptaregeneratorattribute,whichdeclaresaclassthatimplementstheauto-warminglogicforthecache.
www.it-ebooks.info
QueryhandlersThepageaccessedbynavigatingtoPlugin/Stats|QueryHandlershowsanexpandablelistwhereeachitemisaqueryhandlerconfiguredinsolrconfig.xml.Thislistincludeshandlersthatrepresentsearchendpoints(thatis,SearchHandler)butalsootherhandlerssuchas/admin/ping,/admin/dump,and/debug.
TheconfiguredUpdateRequestHandlerinstances(forexample,/updateand/update/json),beingsubclassesofRequestHandler,arealsolistedinthispage.
Foreachhandler,theconsoleshowssomebasicattributessuchastheclassname,version,ashortdescription,andasetofstatisticaldata,aslistedinthefollowingtable:
Attribute Description
handlerStart Thedate(inmilliseconds)whenthehandlerreceiveditsfirstrequest.
Requests Thetotalnumberofrequestsreceived.
Errors Thenumberofrequeststhatraisedanexceptionduringtheexecution.
timeoutsIfthequeryisexecutedwiththetimeAllowedparameterandthegiventimeoutexpires,Solrwillreturnonlypartialresults.Thisattributecountstherequeststhatfacethisscenario.
totalTime Thetotal(requests)executiontime.
avgRequestsPerSecond Theaveragenumberofrequestspersecond.
5minRateReqsPerSecond
15minRateReqsPerSecond
Theaveragenumberofrequestspersecondoverthelastfiveandfifteenminutes,respectively.
avgTimePerRequest Theaverage(request)executiontime.
75thPcRequestTime
95thPcRequestTime
99thPcRequestTime
999thPcRequestTime
Startingfromthedistributionofthetotalrequestexecutiontimes,theseattributesreportthevalueatthe75th,95th,99th,and999thpercentileinthatdistribution,respectively.
So,especiallyforsearchendpoints,thispageisveryusefultounderstandandmonitortheusageandthestatisticalbehaviorofyourSolrinstance.
www.it-ebooks.info
UpdatehandlersUnderthesamepath(Plugin|Stats),theUpdateHandlerisapagecontaininganentrycorrespondingtotheorg.apache.solr.update.DirectUpdateHandler2instance.
Thefollowingtablelistsanddescribestheattributesofthathandler:
Attribute Description
commits Thetotalnumberofcommitrequestsreceived.
autocommitmaxTimeThemaximumamountoftimethatisallowedtopasssinceadocumentwasaddedbeforeautomaticallytriggeringanewcommit.
autocommits Thetotalnumberofhardauto-commitsexecuted.
softautocommits Thetotalnumberofsoftauto-commitsexecuted.
optimizes Thetotalnumberofoptimizerequestsreceived.
rollbacks Thetotalnumberofrollbackrequestsreceived.
expungeDeletes ThetotalnumberofhardcommitswiththeexpungeDeletesflagsettotrue.
docsPending Thetotalnumberofupdatesthathavebeenprocessedbutnotcommitted.
adds Thetotalnumberofaddsrequestsreceived.
deletesById ThetotalnumberofdeleteByIdrequestsreceived.
deletesByQuery ThetotalnumberofdeleteByQueryrequestsreceived.
errors Thetotalnumberoffailedoperations(forexample,updates,commits,androllbacks).
cumulative_adds
cumulative_deletesById
cumulative_deletesByQuery
cumulative_errors
UpdateHandlerhasalifecycleassociatedwithowningSolrCore.Inotherwords,whenSolrCoreisreloaded,anewinstanceofUpdateHandleriscreated.Themonitoringattributesprefixedwithcumulativeareacumulativemeasureofaspecificattribute(forexample,additionsanddeletions)sincetheSolrstartup.
MostSolrinstallationsI’vedoneinlibrariesupdatetheindexonadailybasis.Eachmorning,theUpdateHandlerstatspageshowsaperfectsummaryofwhathappenedduringthepreviousdayandcumulativelysincethelaststartup.Clearly,intheeventoferrors,logfilesserveasmyfriends.
Ontheotherhand,ifIneedtomonitortheoverallprogressofanindexupdateinrealtime,thenIprefertheJMXway,whichisdescribedinthenextsection.
www.it-ebooks.info
JMXJavaManagementExtensions(JMX)areapowerfulsetofAPIsusedtomonitorandmanagearunningJVM.ThebuildingblocksofJMXaretheso-calledManagementBeans(MBeans),whicharebasicallywrappersthatdecorateexistingobjectswithamanagementinterface.ThecoreclassesofJVMaredecoratedwithMBeans.
TipMoreinformationaboutJMXcanbefoundathttp://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html.
MBeansareregisteredwithanMBeanServerthatexposesthosemanagementinterfacestoexternalclients.Applicationsarefreetocreate,register,andexposethemanagementinterfaceoftheirownspecificservices.SolrMBeansarenotautomaticallyregisteredwiththeMBeanServer,butifyouwanttodothat,justwrite(oruncomment)thefollowinglineinsolrconfig.xml:
<jmx/>
TheJVMcomeswithtwobuilt-inJMXclientscalledJConsoleandJVisualVM.
TipJVisualVMandJConsoleareverysimilartools.Here,wewilltalkonlyabouttheJConsolebecauseJVisualVMdoesn’thavetheMBeansperspective.
OpenashellinyourPCandtypethefollowingcommand:
#$JAVA_HOME/bin/jconsole
Adialogpop-upwillappear.ThisisthefirstscreenofJConsole,whichisaJavastandaloneapplication.ThedialogcontainsalistoflocallyrunningJVMs.OneofthemshouldbetheonewhereSolrisrunning.Selectthatentry,andyoushouldseeascreenwithseveraltabs:Overview,Memory,Threads,Classes,VMSummary,andMBeans.Atthemoment,weareinterestedinthelasttab,MBeans.Hereyoucansee(thetreecomponentontheleftside)allregisteredMBeans,asdepictedinthefollowingscreenshot:
www.it-ebooks.info
ForeachMBeaninthetree,youcanseeitsmanagementinterfaceintherightpane.Amanagementinterfaceiscomposedofattributesandoperations.
Operationscanbeinvokedandattributescanbemonitoredbylookingattheirvalueatagivenmomentorforagiveninterval.Todothis,youhavetodouble-clickonthemandactivateareal-timechart.
ThemaindifferencesbetweentheSolrAdministrationConsoleandJConsoleareasfollows:
TheSolrAdministrationConsole,beingawebapplication,offersstaticsnapshotsofthesystem.WithJConsole,it’spossibletoactivatereal-timemonitoringofoneormoreattributes.ThisisnotlimitedtoMBeanattributes.Intheothertabs,youcanmonitorthreads,processors,memory,andgarbagecollection.JConsolehasafinerlevelofgranularitythantheAdministrationConsole.There,wecanseeallattributesandoperationsexposedformanagement.JConsole,beingmoretechnical,islessusablethantheAdministrationConsole.
Clearly,JConsole,JVisualVM,andtheSolrAdministrationConsolearenotalternatives.
www.it-ebooks.info
Theyshouldbeusedtogetherinordertogetadifferentperspectiveonthesystem.
www.it-ebooks.info
SummaryInthischapter,wedescribedsomeconceptsaboutSolradministrationandmonitoring.WeintroducedafewsystemadministrationtoolssuchastheSolrAdministrationConsoleandJConsole,andwecoveredhardwareresources.
Rememberthat,althoughthetopicscoveredinthischaptershouldberelevantforanadministratornowadays,thisroleisspreadamongseveralpeople(especiallyinsmallandmediumcompanies)whoaremostlydevelopers(adeveloperinasmallormediumcompanyisalikea“factotum”).Thisisthereasonitisimportantfornon-administratorstohaveataleastbasicunderstandingofadministration,management,andmonitoring.
Inthenextchapter,youwillseehowSolrcanbedeployedinthecontextofdevelopment,testing,andproduction.Wewillillustrateanddescribeseveraldeploymentscenarios,startingfromthesimplest,standaloneinstance,continuingwithagraduallygrowinglevelofcomplexity,andendingwithSolrCloud.SolrCloudisahighlyavailable,fault-tolerantclusterofSolrserversthatprovidedistributedindexingandsearchcapabilities.
www.it-ebooks.info
Chapter6.DeploymentScenariosThischaptercontainsinformationonthevariouswaysinwhichyoucandeploySolr,includingkeyfeaturesandprosandconsforeachscenario.
Solrhasawiderangeofdeploymentalternatives,frommonolithictodistributedindexesandstandalonetoclusteredinstances.Wewillorganizethischapterbydeploymentscenarios,withagrowinglevelofcomplexity.
Thischapterwillcoverthefollowingtopics:
ShardingReplication:master,slave,andrepeatersSolrCloud
www.it-ebooks.info
StandaloneinstanceAlltheexampleswefoundinthepreviouschaptersuseastandaloneinstanceofSolr,thatis,oneormorecoresmanagedbyaSolrdeploymenthostedinastandaloneservletcontainer(forexample,Jetty,Tomcat,andsoon).
Thiskindofdeploymentisusefulfordevelopmentbecause,asyoulearned,itisveryeasytostartanddebug.Besides,itcanalsobesuitableforaproductioncontextifyoudon’thavestrictnon-functionalrequirementsandhaveasmallormediumamountofdata.
TipIhaveusedastandaloneinstancetoprovideautocompleteservicesforsmallandmediumintranetsystems.
Anyway,themainfeaturesofthiskindofdeploymentaresimplicityandmaintainability;onesimplenodeactsasbothanindexerandasearcher.Thefollowingdiagramdepictsastandaloneinstancewithtwocores:
www.it-ebooks.info
ShardsWhenamonolithicindexbecomestoolargeforasinglenodeorwhenadditions,deletions,orqueriestaketoolongtoexecute,theindexcanbesplitintomultiplepiecescalledshards.
NoteTheprevioussentencehighlightsalogicalandtheoreticalevolutionpathofaSolrindex.However,this(ingeneral)isvalidforallscenarioswewilldescribe.Itisstronglyrecommendedthatyouperformapreliminaryanalysisofyourdataandtheestimatedgrowthfactorinordertodecidefromthebeginningtherightconfigurationthatsuitsyourrequirements.Althoughitispossibletosplitanexistingindexintoshards(https://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/index/PKIndexSplitter.htmlthingsdefinitelybecomeeasierifyoustartdirectlywithadistributedindex(ifyouneedit,ofcourse).
Theindexissplitverticallysothateachshardcontainsadisjointsetoftheentireindex.Solrwillqueryandmergeresultsacrossthoseshards.ThefollowingdiagramillustratesaSolrdeploymentwith3nodes;thisdeploymentconsistsoftwocores(C1andC2)dividedintothreeshards(S1,S2,andS3):
Whenusingshards,onlyqueryrequestsaredistributed.Thismeansthatit’suptotheindexertoaddanddistributethedataacrossnodes,andtosubsequentlyforwardachange
www.it-ebooks.info
request(thatis,delete,replace,andcommit)foragivendocumenttotheappropriateshard(theshardthatownsthedocument).
TipTheSolrWikirecommendsasimple,hash-basedalgorithmtodeterminetheshardwhereagivendocumentshouldbeindexed:
documentId.hashCode()%numServers
Usingthisapproachisalsousefulinordertoknowinadvancewheretosenddeleteorupdaterequestsforagivendocument.
Ontheoppositeside,asearcherclientwillsendaqueryrequesttoanynode,butithastospecifyanadditionalshardsparameterthatdeclaresthetargetshardsthatwillbequeried.Inthefollowingexample,assumingthattwoshardsarehostedintwoserverslisteningtoports8080and8081,thesamerequestwhensenttobothnodeswillproducethesameresult:
http://localhost:8080/solr/c1/query?
q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2
http://localhost:8081/solr/c2/query?
q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2
Whensendingaqueryrequest,aclientcanoptionallyincludeapseudofieldassociatedwiththe[shard]transformer.Inthiscase,asapartofeachreturneddocument,therewillbeadditionalinformationindicatingtheowningshard.Thisisanexampleofsucharequest:
http://localhost:8080/solr/c1/query?
q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2&src_shard:
[shard]
Hereisthecorrespondingresponse(notethepseudofieldaliasedassrc_shard):
<resultname="response"numFound="192"start="0">
<doc>
<strname="id">9920</str>
<strname="brand">Fender</str>
<strname="model">JazzBass</str>
<arrname="artist">
<str>MarcusMiller</str>
</arr><strname="series">MarcusMillersignature</str>
<strname="src_shard">localhost:8080/solr/shard1</str>
</doc>
…
<doc>
<strname="id">4392</str>
<strname="brand">MusicMan</str>
<strname="model">StingRay</str>
<arrname="artist"><str>TonyLevin</str></arr>
<strname="series">5stringsDeLuxe</str>
<strname="src_shard">localhost:8081/solr/shard2</str>
</doc>
www.it-ebooks.info
</result>
Thefollowingareafewthingstokeepinmindwhenusingthisdeploymentscenario:
TheschemamusthaveauniqueKeyfield.Thisfieldmustbedeclaredasstoredandindexed;inaddition,itissupposedtobeuniqueacrossallshards.InverseDocumentFrequency(IDF)calculationscannotbedistributed.IDFiscomputedpershard.Joinsbetweendocumentsbelongingtodifferentshardsarenotsupported.Ifashardreceivesbothindexandqueryrequests,theindexmaychangeduringaqueryexecution,thuscompromisingtheoutgoingresults(forexample,amatchingdocumentthathasbeendeleted).
www.it-ebooks.info
Master/slavesscenarioInamaster/slavesscenario,therearetwotypesofSolrservers:anindexer(themaster)andoneormoresearchers(theslaves).
Themasteristheserverthatmanagestheindex.Itreceivesupdaterequestsandappliesthosechanges.Asearcher,ontheotherhand,isaSolrserverthatexposessearchservicestoexternalclients.
Theindex,intermsofdatafiles,isreplicatedfromtheindexertothesearcherthroughHTTPbymeansofabuilt-inRequestHandlerthatmustbeconfiguredonboththeindexersideandsearcherside(withinthesolrconfig.xmlconfigurationfile).
Ontheindexer(master),areplicationconfigurationlookslikethis:
<requestHandler
name="/replication"
class="solr.ReplicationHandler">
<lstname="master">
<strname="replicateAfter">startup</str>
<strname="replicateAfter">optimize</str>
<strname="confFiles">schema.xml,stopwords.txt</str>
</lst>
</requestHandler>
Thereplicationmechanismcanbeconfiguredtobetriggeredafteroneofthefollowingevents:
Commit:AcommithasbeenappliedOptimize:TheindexhasbeenoptimizedStartup:TheSolrinstancehasstarted
Intheprecedingexample,wewanttheindextobereplicatedafterstartupandoptimizecommands.UsingtheconfFilesparameter,wecanalsoindicateasetofconfigurationfiles(schema.xmlandstopwords.txt,intheexample)thatmustbereplicatedtogetherwiththeindex.
NoteRememberthatchangesonthosefilesdon’ttriggeranyreplication.Onlyachangeintheindex,inconjunctionwithoneoftheeventswedefinedinthereplicateAfterparameter,willmarktheindex(andtheconfigurationfiles)asreplicable.
Onthesearcherside,theconfigurationlookslikethefollowing:
<requestHandler
name="/replication"
class="solr.ReplicationHandler">
<lstname="slave">
<strname="masterUrl">http://<localhost>:<port>/solrmaster</str>
<strname="pollInterval">00:00:10</str>
</lst>
</requestHandler>
www.it-ebooks.info
Youcanseethatasearcherperiodicallykeepspollingthemaster(thepollIntervalparameter)tocheckwhetheranewerversionoftheindexisavailable.Ifitis,thesearcherwillstartthereplicationmechanismbyissuingarequesttothemaster,whichiscompletelyunawareofthesearchers.
Thereplicabilitystatusoftheindexisactuallyindicatedbyaversionnumber.Ifthesearcherhasthesameversionasthemaster,itmeanstheindexisthesame.Iftheversionsaredifferent,itmeansthatanewerversionoftheindexisavailableonthemaster,andreplicationcanstart.
Otherthanseparatingresponsibilities,thisdeploymentconfigurationallowsustohaveaso-calleddiamondarchitecture,consistingofoneindexerandseveralsearchers.Whenthereplicationistriggered,eachsearcherintheringwillreceiveawholecopyoftheindex.Thisallowsthefollowing:
Loadbalancingoftheincoming(query)requests.Anincrementtotheavailabilityofthewholesystem.Intheeventofaservercrash,theothersearcherswillcontinuetoservetheincomingrequests.
Thefollowingdiagramillustratesamaster/slavedeploymentscenariowithoneindexer,threesearchers,andtwocores:
www.it-ebooks.info
Ifthesearchersareinseveralgeographicallydislocateddatacenters,anadditionalrolecalledrepeatercanbeconfiguredineachdatacenterinordertorationalizethereplicationdatatrafficflowbetweennodes.Arepeaterissimplyanodethatactsasbothamasterandaslave.Itisaslaveofthemainmaster,andatthesametime,itactsasmasterofthesearcherswithinthesamedatacenter,asshowninthisdiagram:
www.it-ebooks.info
ShardswithreplicationThisscenariocombinesshardsandreplicationinordertohaveascalablesystemwithhighthroughputandavailability.Thereisoneindexerandoneormoresearchersforeachshard,allowingloadbalancingbetween(query)shardrequests.Thefollowingdiagramillustratesascenariowithtwocores,threeshards,oneindexer,and(duetoproblemswithavailablespace),onlyonesearcherforeachshard:
Thedrawbackofthisapproachisundoubtedlytheoverallgrowingcomplexityofthesystemthatrequiresmoreeffortintermsofmaintainability,manageability,andsystemadministration.Inadditiontothis,eachsearcherisanindependentnode,andwedon’thaveacentraladministrationconsolewhereasystemadministratorcangetaquickoverviewofsystemhealth.
ThesedisadvantageshavebeeneithermitigatedorovercomeinSolrCloud,whichisdescribedinthenextsection.
www.it-ebooks.info
SolrCloudSolrCloudisahighlyavailable,fault-tolerantclusterofSolrserversthatprovidesdistributedindexingandsearchcapabilities.ThefollowingdiagramillustratesasimpleSolrCloudscenario:
AlthoughSolrCloudintroducedanewterminologytodefinethingsinadistributeddomain,theprecedingdiagramhasbeendrawnwiththesameconceptsthatwesawinthepreviousscenarios,forbetterunderstanding.
TipStartingfromSolr4.10.0,thedownloadbundlecontainsaninteractive,wizard-likecommand-linesetupforasampleSolrCloudinstallation.Astep-by-stepguideforthisisavailableathttps://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud.
ThefollowingsectionswilldescribetherelevantaspectsofSolrCloud.
www.it-ebooks.info
ClustermanagementApacheZookeeperwasintroducedinSolrCloudforclustercoordinationandconfiguration.Thismeansitisacentralactorinthisscenario,providingdiscovery,configuration,andlookupservicesforothercomponents(includingclients)togatherinformationabouttheSolrcluster.
ApacheZookeeper,beingacentralcomponent,canbeorganizedinaclusteritself(asdepictedinthepreviousdiagram)inordertoavoidasinglepointoffailure.AclusterofZookeepernodesiscalledensemble.
TipFormoreinformationaboutApacheZookeeper,visithttp://zookeeper.apache.org,theprojecthomepage.
www.it-ebooks.info
Replicationfactor,leaders,andreplicasIntheprecedingdiagram,wehaveonlyonecore(C1)withthreeshards(S1,S2,andS3).Now,themaindifferencebetweenthepreviousdistributedscenario(wherewemetshards)andthisscenarioisthathere,there’sacopyofeachshardineverynode.Thatcopyiscalledareplica.Inthisexample,wehavethreecopiesforeachshard,butthisisjustforsimplicity;youcanhaveasmanycopiesasyouwant.
Morespecifically,SolrCloudhasapropertycalledreplicationfactor,thatdeterminesthetotalnumberofcopiesintheclusterforeachshard.Amongthecopies,oneiselectedastheleader(theletter“L”onC1/S1onthefirstnode)whiletheremainingarereplicas(theletter“R”).
TipIntheprecedingdiagram,thereplicationfactoris3anditisequaltothenumberofnodes.Keepinmindthatthisisacoincidence;thosemeasurescouldbedifferent,andtheyactuallydependonyourclusterconfigurationandneeds.
Thisreplicationfeaturesatisfiesthreeimportantnonfunctionalrequirements:loadbalancing,highavailability,andbackup.Wehavealreadydescribedhowtheclassicreplicationmechanismprovidesloadbalancing.Havingthesamedatawithinmorethanonenodeallowsasearchertoissuequeryrequeststothosenodesinaround-robinfashion,thusexpandingtheoverallcapacityofthesystemintermsofqueriespersecond.Here,thecontextisthesame;eachshard,regardlessofwhetheritisaleaderorareplica,canbefoundonnnodes(wherenisthereplicationfactor);therefore,aclientcanusethosenodesforloadbalancingrequests.
Highavailabilityisadirectconsequenceoftheredundancyintroducedwithshardreplication.Thepresenceofthesamedata(andthesamesearchservices)onseveralnodesmeansthat,evenifoneofthosenodecrashes,aclientcancontinuetosendrequeststotheremainingnodes.
Theredundancyintroducedwiththereplicationalsoworksasabackupmechanism.Havingthesamethingsinseveralplacesprovidesabetterguaranteeagainstdataloss.Afterall,thisistheunderlyingprincipleofthepopularclouddataservices(forexample,Dropbox,ICloud,andCopy).
www.it-ebooks.info
DurabilityandrecoveryEachnodemaintainsawrite-aheadtransactionlog,whereanychangeisrecordedbeforebeingappliedtotheindex.Therefore,thetransactionlogisavailableforleadersandreplicas,anditwillbeusedtodeterminewhichcontentneedstobepartofachosenreplicaduringsynchronization.Forinstance,whenanewreplicaiscreated,itreferstoitsleaderanditstransactionlogtoknowwhichcontenttoget.
Thetransactionlogwillalsobeusedwhenrestartingaserverthatdidn’tshutdowngracefully.Itscontentwillbe“replayed”inordertosynchronizelocalleadersandreplicas.
TipWrite-aheadloggingiswidelyusedindistributedsystems.Formoreinformationaboutit,seehttps://cwiki.apache.org/confluence/display/solr/NRT%2C+Replication%2C+and+Disaster+Recovery+with+SolrCloud
Thetransactionlogpathcanbeconfiguredinanappropriatesectionofthesolconfig.xmlfile.
www.it-ebooks.info
ThenewterminologyNowthatthemainfeaturesofSolrCloudhavebeenexplained,wecanstopthinkingaboutitasanevolutionoftheshardscenarioandcoveritsownterminology:
Parameter Description
Node ThisisaJavaVirtualMachinerunningSolr.
Cluster AsetofSolrnodesthatformasingleunitofservice.
Shard Wepreviouslydefinedashardasaverticalsubsetoftheindex,thatis,asubsetofalldocumentsintheindex.Ashardisasinglecopyofthatsubset.InSolrCloud,itcanbealeaderorareplica.
Partition/slice Asubsetofthewholeindexreplicatedononeormorenodes.Asliceisbasicallycomposedofallshards(leaderandreplicas)belongingtothesamesubset.
Leader Eachshardhasonenodeidentifiedasitsleader.Thisroleiscrucialfortheupdateworkflow.Alltheupdatesbelongingtoapartitionroutethroughtheleader.
ReplicaThereplicationfactordeterminesthetotalnumberofcopieseachshardhas.Amongallofthosecopies,oneiselectedastheleader,whiletheothersarecalledreplicas.Whilequeryingcanbedoneacrossallshards,updatesarealwaysdirected(orforwardedbyreplicas)toleaders.
Replicationfactor Thenumberofcopiesofashard(andhence,ofadocument)maintainedbythecluster.
Collection Acorethatislogicallyandphysicallydistributedacrossthecluster.Inourexample,wehaveonlyonecollection(C1).
www.it-ebooks.info
AdministrationconsoleInaSolrClouddeployment,theadministrationconsoleofeachnodewillreportanadditionalmenuitemcalledCloud,whereit’spossibletogetanoverallviewofthecluster.Youcanchoosebetweenseveralgraphicrepresentationsofthecluster(tree,graph,andradial),butallofthemhaveacommonaim—givinganimmediateoverviewoftheclusterintermsofnodes,shards,andcollections.ThisisascreenshotfromtheadministrationconsoleoftheSolrCloudusedinthissection:
www.it-ebooks.info
CollectionsAPITheCollectionsAPIisusedtomanagethecluster,includingcollections,shards,andmetadataaboutthecluster.ThisinterfaceiscomposedofasingleHTTPserviceendpointlocatedathttp://<hostname>:<port>/<contextroot>/admin/collections.
TheCollectionsAPIacceptsanactionparameter,whichisamnemoniccodeassociatedwiththecommandthatwewanttoexecute.Eachcommandhasitsownsetofparametersthatdependonthegoalofthecommand.Thefollowingtableliststheallowedvaluesfortheactionparameter(thatis,theavailablecommands):
Action Description
CREATE Createsanewcollection.
RELOAD Reloadsacollection.ThisisusedwhenaconfigurationhasbeenchangedinZooKeeper.
DELETE Deletesacollection.
LIST Returnsthenamesofthecollectionsinthecluster.
CREATESHARD Createsanewshard.
SPLITSHARD Splitsanexistingshardintotwonewshards.
DELETESHARD Deletesaninactiveshard.
CREATEALIAS Createsorreplacesanaliasforanexistingcollection.
DELETEALIAS Deletesanalias.
ADDREPLICA Addsanewreplicaforagivenshard.
DELETEREPLICA Deletesareplicaofashard.
CLUSTERPROP Adds,edits,ordeletesaclusterproperty.
MIGRATE Movesdocumentsbetweencollections.
ADDROLEAddsaroletoanode.Atthetimeofwritingthisbook,theonlysupportedroleisanoverseer.Thisistheclusterleaderresponsibleforshardassignmentsandnodemanagementoperations.
REMOVEROLE Removesarolefromanode.
OVERSEERSTATUS Returnsthecurrentstatusoftheoverseer,includingsomestatsaboutservicescalls(forexample,createcollectionandcreateshard).
CLUSTERSTATUS Returnstheclusterstatus,includingshards,collections,replicas,aliases,andclusterproperties.
REQUESTSTATUS Returnsthestatusofthoserequeststhathavebeenexecutedasynchronously(for
www.it-ebooks.info
example,MIGRATE,SPLITSHARD,andCREATECOLLECTION).
ADDREPLICAPROP Addsorreplacesareplicaproperty.
DELETEREPLICAPROP Deletesareplicaproperty.
BALANCESHARDUNIQUE Distributesagivenpropertyevenlyamongthephysicalnodesthatmakeupacollection.
Thecompletelistofparametersforeachcommandisavailableathttps://cwiki.apache.org/confluence/display/solr/Collections+API.
www.it-ebooks.info
DistributedsearchQueriescanbesenttoanynodeperformingafulldistributedsearchacrosstheclusterwithloadbalancingandfailover.SolrCloudalsoallowspartialqueries,thatis,queriesexecutedagainstagroupofshards,alistofservers,oralistofcollections.
TipIfyouareusingJavaonclienttheside,CloudSolrServerinSolrjcompletelysimplifiescommunicationbetweentheclient,Zookeeper,andthecluster.Asadeveloper,youwillworkwiththeusualSolrServerinterface.
www.it-ebooks.info
Cluster-awareindexAdrawbackofthefirstdistributedscenariowemet(thatis,shards)wasthataclientthatwantstoissueanupdaterequestneedstoexplicitlypointtothetargetshard.ThisisnolongervalidinaSolrCloudcontextbecause,foragivenshard,therecouldbemorethanonecopy(thatis,aleaderandzeroormorecopies).Sotheupdatepathbecomesthefollowing:
UpdatescanbesenttoanynodeintheclusterIfthetargetnodeistheleaderoftheshardowningthedocument,theupdateisexecutedthere,andthenitisforwardedtoallreplicasIfthetargetnodeisareplica,thentheupdaterequestisforwardedtoitsleader,andtheflowdescribedinthepreviouspointapplies
TipTheCloudSolrServerinSolrjasksZookeeperabouttheleader’slocationbeforesendingupdates.Thus,requestsarealwaystargetedatleaders,avoidingadditionalnetworkround-trips.
www.it-ebooks.info
SummaryInthischapter,wedescribedvariouswaysinwhichyoucandeploySolr.Eachdeploymentscenariohasspecificfeatures,advantages,anddrawbacksthatmakeachoiceidealforonecontextandbadforanother.Agoodthingisthatthedifferentscenariosarenotstrictlyexclusive;theyfollowanincrementalapproach.Inanidealcontext,thingsshouldstartimmediatelywiththeperfectscenariothatfitsyourneeds.However,unlessyourrequirementsareclearrightfromthestart,youcanbeginwithasimpleconfigurationandthenchangeit,dependingonhowyourapplicationevolves.
Inthenextchapter,wewillwalkthroughsomeusefuladd-onsthatarenotpartofthecoredistributionbutareincludedintheSolrdownloadbundle.
www.it-ebooks.info
Chapter7.SolrExtensionsEverypopularopensourceprojectusuallyincludesacontribfoldercontainingseveralextramodulestosolvecommonusecaseimplementationproblems.InSolr,youcanfindsuchmoduleswithinthedownloadbundle,asdepictedinthefollowingscreenshot:
Supposeyourdataisinarelationaldatabase,anXMLfilewithacustomformat,oramailserver;youneedtoindexdatacomingfromaContentManagementSystem(suchasDrupal,Joomla!,orWordPress);oryouhaverichdocuments(suchasPDFsorMicrosoftOfficedocuments)andyouwanttodosomekindofautomatickeywordextraction.Ingeneral,theserequirementsarenotcoveredbythecorepartofSolr.Youwillhavetopluginandconfigurethosecontributionmodules.
Theaimofthischapteristodescribesuchmodules.Inordertodothat,wewillmakeuseofapreloadedsampleSolrinstance,withthoseextensions.Tostartthisinstance,youhavetocheckoutthesourceprojectassociatedwiththechapter,changethedirectorytothech7folder,andtypethisfromthecommandline:
#mvncleanpackagecargo:run
IfyoucheckedouttheprojectusingEclipse,youmighthavenoticedthat,underthesrc/dev/eclipsefolder,thereispreconfiguredlauncher.Right-clickonitandchoosetheDebugas…menuitem.
Regardlessofthewayyouchoose,youwillseesomethinglikethisattheend:
[INFO]Jetty8.1.15.v20140411Embeddedstartedonport[8983]
[INFO]PressCtrl-Ctostopthecontainer…
Thismeansthatthesampleinstanceisupandrunning.Thischapterwillcoverthe
www.it-ebooks.info
followingpoints:
ImportingdatafromseveraldatasourcesTextandmetadataextractionfromdigitaldocumentsLanguageidentificationSolritas(thatis,SolrandVelocity)Othercontribmodules
www.it-ebooks.info
DataImportHandlerTheDataImportHandlerisamodulethatenablesSolrtoloaddatafromseveraltypesofdatasources.Themostfrequenttypeofstoragewhereapplicationsputtheirdataisundoubtedlyarelationaldatabase,butingeneral,wecouldhavealotofscenarioshere:filesystems,websites,emails,FTPservers,LDAP,NoSQLdatabases,andsoon.
TheDataImportHandlermodule,otherthanprovidingalotofready-to-useconnectors,isanextensibleframeworkwheredevelopersarefreetoinjecttheirstorage-specificconnectorlogic.Theconfigurationhappensintwodifferentplaces:thefirstisthesolrconfig.xmlfile(asusual),wherethehandlerisdeclaredasfollows:
<requestHandlername="/import"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lstname="defaults">
<strname="config">dih-config.xml</str>
</lst>
</requestHandler>
Thesecondisthehandlerconfigurationfile(intheprecedingexample,wecalleditdih-config.xml).Althoughthespecificcontentofthatfilecouldvary,mainlydependingonthekindofdatasourceweareusing,thebuildingblocksofaDataImportHandlerdomainaredatasources,documents,entities,fields,transformers,andprocessors.
www.it-ebooks.info
DatasourcesAdatasourceisacollectionofrecordsthatstoredata.Althoughyouareprobablythinkingofrelationaldatabases,datasourcescanalsobeassociatedwithotherkindsofsourcesandprotocols,suchaswebsites(HTTP),FTPservers,LDAP,mailservers,andsoon.
AdatasourcedeclarationisprobablythefirstthingyouwillmeetinaDataImportHandlerconfigurationfile.Firstofall,youmustdeclarewhereyourdatais:
<dataSource
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"url="jdbc:mysql://host/database-name"
user="database_username"
password="database_password"/>
<dataSource
type="FileDataSource"encoding="UTF-8"/>
Notethatit’spossibletodeclaremorethanonedatasource(forexample,adatabaseandafilesystemortwodifferentdatabases).Eachdatasourcehasitsownspecificpropertiesthatdependonitsnature.Thefollowingtabledescribestheavailabledatasources:
Name Description
JdbcDataSource
Thisconnectstoadatabase(adirectconnectionorJNDIdatasource)usingaJDBCdriver.NotethatSolrdoesn’tcomewithanyJDBCdrivershipped.Youmustobtainitseparatelyandputthatlibraryundertheserverclasspathorunderthecorelibfolder.
URLDataSource ReadscharacterfilesusingHTTP.
BinURLDataSource ReadsbinaryfilesusingHTTP.
FileDataSource Readsfromlocalcharacterfiles.
BinFileDataSource Readsfromlocalbinaryfiles.
ContentStreamDataSource ReadsfromtheContentStreamofaPOSTrequestusingjava.io.Reader.
BinContentStreamDataSource ReadsfromtheContentStreamofaPOSTrequestusingjava.io.InputStream.
FieldReaderDataSource Usedinconjunctionwithotherdatasources,whenagivenfieldcontainstextthatneedsfurtherprocessing(forexample,whenitcontainsanXMLdocument).
FieldStreamDataSourceUsedinconjunctionwithotherdatasourceswhenagivenfieldcontainsbinarycontentthatneedsfurtherprocessing(forexample,whenitcontainsthevalueofaBLOBdatabasecolumn).
www.it-ebooks.info
Documents,entities,andfieldsMappingbetweenexternaldataandSolrisdoneusingdocuments,entities,andfields.
Adocumentrepresentsalogicaltype(suchasproducts,books,andassociations).Itcontainsoneormoreentities.
Entitiesarecalledrootorsubentitiesdependingontheirnestinglevel.Root-entitiesaredirectchildrenofadocument.Sub-entitiesarechildrenofanotherentity.Theyhavearelationshipwiththeirparents;withintheirconfiguration,it’spossibletouseanexpressionlanguagetorefertotheirparents.
FieldsareconcreteplaceswherethemappingbetweentheexternaldatasourceandSolrdocumentoccurs.Thefollowingfigureschematizestheserelationships:
Asingledocumentcanhaveoneormorerootentities.Eachentitydefinesthelogictogatheritsdataandpopulateitsfields.
Inthefollowingexample,aSolrschemacontainsbooks.Eachbookconsistsofanidentifier(id),atitle(title),andoneormoreauthors.Therearetwodatabasetables,BOOKSandAUTHORS,witha1:nrelationship(thismeansthatabookcanhavemorethanoneauthor).
First,let’sseehowtherootentity(thebook)isdefined:
<documentname="books">
<entityname="book"dataSource="my-ds"
query="SELECTBOOK_ID,TITLEFROMBOOKS"onError="skip">
<fieldcolumn="BOOK_ID"name="id"/>
<fieldcolumn="TITLE"name="title"/>
Asyoucansee,theentityisassociatedwithadatasourcecalledmy-ds.Itisconfiguredwithaquery,andforeachrecordoftheoutcomingResultSet,weareinterestedintwofields:BOOK_IDandTITLE.TheyaremappedwiththeidandtitlefieldsintheSolrschema.
TipIfthenameofthecolumn(orthealias)inResultSetcoincideswiththenameoftheSolrfield(caseinsensitive),the<field>declarationcanbeomitted.Solrwillperformthe
www.it-ebooks.info
mappingautomatically.So,intheprecedingexample,theTITLEmappingcanberemoved.
Now,sincethecardinalityoftherelationshipbetweenbooksandauthorsis1:n,weneedtodefineasub-entity.Foreachbook,wemustquerythedatasourceagaintofindthecorrespondingauthors:
<entityname="book"dataSource="my-ds"query="SELECTBOOK_ID,TITLEFROM
BOOKS"onError="skip">
<fieldcolumn="BOOK_ID"name="id"/>
<fieldcolumn="TITLE"name="title"/>
<entityname="author"dataSource="my-ds"query="SELECTNAMEFROMAUTHORS
WHEREBOOK_ID=${book.BOOK_ID}">
<fieldcolumn="NAME"name="author"/>
Theauthorsub-entitydeclaresaqueryontheAUTHORStable.Itusesasimpleexpressionlanguagetorefertotheidentifierofthecurrent(parent)book:
${<parententityname>.<databasealiasorcolumnname>}
Obviously,thisisareallysimplifiedexample.Inarealproductionscenario,youwillprobablymeetcomplicatedrelationalschemas,buttheDataImportHandlerlogicwillbealwaysthesame—detectandconfigureentitiesorfieldsinordertodenormalizeyourdatamodel.
www.it-ebooks.info
TransformersAtransformerisafunctionassociatedwithanentity(rootornested)thatcanmanipulatethefieldsfetchedbytheentityitself.Thetransformermustbedeclaredasanattributeofthetargetentity:
<entityname="author"transformer="script:createAuthorFullName">
Thecorrespondingfunctionwillbecalledforeachsetoffields(record)fetchedbythequeryassociatedwiththeentity.Thefunctionhascompletecontroloverthefetchedrecord.Itcanremove,add,orreplacefields.
Inthepreviousexample,theSolrschemaincludesanauthorfieldthatissupposedtoholdthecompletenameoftheauthor(forexample,DanteAlighieri).Nowlet’simaginethattheAUTHORStablecontainstwoseparatecolumnsinstead—FIRST_NAMEandLAST_NAME.Withthehelpofabuilt-inscripttransformer,wecanwriteasimpleJavaScriptfunctiontocombinethetwofields:
<script><![CDATA[
functioncreateAuthorFullName(record){
varfirst=record.remove('FIRST_NAME');
varlast=record.remove('LAST_NAME');
record.put('author',first+''+last);
returnrecord;
}
]]></script>
Notehowwemanipulatedthecurrentrecordbyaddinganewfield(author)andremovingtheLAST_NAMEandFIRST_NAMEfields.
Thefollowingtableliststheavailablebuilt-intransformers:
Name Description
ScriptTransformer ExecutesafunctionwritteninJavaScriptoranotherscriptinglanguagesupportedbyJava.
DateFormatTransformer Createsjava.util.Dateinstancesfromstringliterals.
HTMLStripTransformer StripsoffHTMLtagsfromfieldvalues.
LogTransformer Logsmessagesusingagiventemplate.
NumberFormatTransformer Createsnumberinstancesfromstringliterals.
RegexTransformer Usesregularexpressionstomanipulatedatainfields.
TemplateTransformer
Putsvaluesinacolumnbyresolvinganexpressioncontainingothercolumns.Forexample,theconcatenationwegotwiththeScriptTransformercanalsobedoneusingthistransformer:
<fieldname="author"template="${author.FIRSTNAME}${author.LAST_NAME}"
www.it-ebooks.info
Atransformerissimplyaclassthatextendsorg.apache.solr.handler.dataimport.Transformerso,ifthebuilt-inportfoliodoesn’tmeetyourneeds,itisalwayspossibletocreateacustomimplementation.
www.it-ebooks.info
EntityprocessorsEachentityishandledbyaso-calledEntityProcessorthatdefaultstoSQLEntityProcessor.Thisisbecausetherelationaldatabaseisthemostpopulartypeofdatasource.
However,whenusingadifferentdatasourcesuchasHTTP,filesorstreams,theentitymanagementlogicshouldhaveitsownspecificrequirementsthatmostprobablyfalloutsidetheareacoveredbySQLEntityProcessor.Inthesecases,youcanoverridethedefaultsettingsbyexplicitlydeclaringanEntityProcessorforagivenentity.
Asusual,therearealotofbuilt-inEntityProcessorinstancesbutitisalwayspossibletocreateacustomimplementationbyextendingtheorg.apache.solr.handler.dataimport.Entityprocessorclass.
Thefollowingtablelistsanddescribesavailableentityprocessors:
Name Description
SqlEntityProcessor Thisisthedefaultentityprocessorassignedtoeachentity.Itprovidessupporttoreadandcachedatafromdatabases.ItisusedinconjunctionwithJdbcDataSource.
FileListEntityProcessor Enumeratesthelistoffilesfromafilesystembasedoncriteriaspecifiedintheassociatedentity(forexample,basepath,recursive,andfilenamepattern).
LineEntityProcessor Readsfromadatasourceonaline-by-linebasisandproducesafieldcalledrawLineforeachlineread.
MailEntityProcessor HandlesemailsandattachmentsfromPOP3orIMAPsources.
PlainTextEntityProcessor ReadsfromadatasourceandreturnsafieldcalledplainText.Thisfieldcontainsastringrepresentingthesourcecontent.
SolrEntityProcessor ReadsvaluesfromanotherSolrinstanceusingSolrj.EachreturnedrecordisaSolrDocumentinstance.
TikaEntityProcessor ExtractsmetadataandtextfromrichdocumentsbymeansofApacheTika.Later,wewillseetheContentExtractionLibrary,whichalsousesTikaastheextractionengine.
XPathEntityProcessor UsesastreamingXPATHparsertoextractvaluesfromXMLdocuments.
www.it-ebooks.info
EventlistenersThedocumentelementintheDataImportHandlerconfigurationallowsustodeclaretwoeventlistenerstointerceptthemostrelevanteventsofadataimportlifecycle—onImportStartandonImportEnd:
<document
onImportStart="com.foo.MyImportStartEventListener"
onImportEnd="com.foo.MyImportEndEventListener">
Theeventlistenersmustimplementtheorg.apache.solr.handler.dataimport.EventListenerinterface,whichgivesthemaccess(bymeansofanorg.apache.solr.handler.dataimport.Contextinstance)tomostDataImportHandlerobjectsandeventstatisticssuchasdocumentsskipped,indexed,failed,andsoon.
www.it-ebooks.info
ContentExtractionLibraryTheContentExtractionLibrary(alsoknownasSolrCell)integratesthepopularApacheTikaframeworktodetectandextractmetadataandtextfromalargevarietyoffiletypessuchasPDF,MicrosoftOffice,LibreOffice,andOpenOfficedocuments.
ApacheTikaprovidesafaçadeparserinterfaceontopofseverallow-levelframeworksthatareabletomanageandmanipulatespecificfiletypes(forexample,PDFBoxforPDFsandApachePOIforMicrosoftdocuments).Itssimpleinterfacealsoprovidesautomaticmime-typedetection,sotheframeworkitselfisabletounderstandthecorrectparserthatneedstobeappliedforagivenfile.
OntheSolrside,adedicatedExtractingRequestHandlerwillbeinchargeofgettingtheinputdata(files)sentbyclientsandextractingmetadataandtextbymeansofTika.
TheconfigurationofExtractingRequestHandlerfollowsthesameprocedurethatwesawfortheotherhandlers.Specifically,ithastobedeclaredinsolrconfig.xml,asfollows:
<requestHandlername="/update/extract"
class="solr.extraction.ExtractingRequestHandler">
<lstname="defaults">
…
</lst>
</requestHandler>
SolrCellhasseveraloptionsthatcanbeconfiguredtofine-tuneitsbehavior.Mostofthemarerelatedtometadatahandling,fieldnamemapping,andcustomTikaconfiguration.
TipForacompletelistofallconfigurationparameters,gotohttps://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
Thesrc/solr/solr-home/example-datafolderintheexampleprojectcontainsadocumentthatcanbesenttoSolrCell.Openashellandtypethefollowing(replacethePROJECT_HOMEplaceholderwithyourch7projectlocalpath):
#curl"http://localhost:8983/solr/example/update/extract?commit=true"-F
data=@PROJECT_HOME/ch7/src/solr/solr-home/example-data/libreoffice-
writer.odt
Waitforamoment,andthenyoushouldseearesponselikethis:
<response>
<lstname="responseHeader">
<intname="status">0</int>
<intname="QTime">572</int>
</lst>
</response>
Thedocument(theLibreOfficedocumentinthiscase,butyoucanalsotryotherfiles)hasbeenindexed.Youcanseethat,whenyouopenthebrowserandtypehttp://127.0.0.1:8983/solr/example/select?q=stream_name:libreoffice-
www.it-ebooks.info
writer.odt&indent=true,theXMLresponseshowstheextractedtext(underthetextattribute)andallthemetadatafieldsthathavebeendetectedforthatdocument.
www.it-ebooks.info
LanguageIdentifierThelanguageIdentifierextensiondetectsthelanguage(orlanguages)offieldsbelongingtoagivendocument.Thisisaveryusefuladd-ontouseinconjunctionwiththepreviouslydescribedextractionlibrary,togetadditionalinformationaboutdatathathasbeenindexed.
ThecomponentisimplementedasanUpdateRequestProcessorsubclassthatinterceptsandanalyzestheincomingdata:
<processor
class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcess
orFactory">
<strname="langid.fl">text</str>
<strname="langid.langField">language</str>
<strname="langid.fallback">en</str>
</processor>
Asyoucansee,thisprocessorcanbeconfiguredwithseveraloptions.Wecandeclarethefieldsoftheincomingdocumentsthatmustbeanalyzed,thenameofthefieldthatwillholdtheresultsoflanguagedetection,oradefaultfallbacklanguageincasenodetectionispossible.
TipIntheexampleprojectassociatedwiththischapter,youwillfindasolrconfig.xmlfilewherethechainisalreadydefinedbuttheUpdateRequestProcessoriscommentedout.Justremovethecommentmarkers,reloadthecoreusingtheAdministrationConsole,andreindexthedocumentsundertheexample-datafolder,followingthesameprocedureaswedescribedintheprevioussection.Attheend,youwillseeanadditional“language”fieldineachdocument;thatistheresultofthelanguagedetectioncomponent.
Youshouldknowthatdeclaringtheprocessorwithinthesolrconfig.xmlfileisnotenough.Weneedtoinsertthatintoanupdaterequestprocessorchain,andfinallyassociatethatchainwithanUpdateRequestHandler.Onlythoseupdaterequeststhatwillbereceivedbythathandlerwillpassthroughthelanguagedetectionanalysischain.
www.it-ebooks.info
RapidprototypingwithSolaritasSolritasisthenameofacontributionmodulethatintegratesSolrwithApacheVelocity.ItisbasicallyaresponsewriterthatusestheApacheVelocitytemplateenginetorenderSolrresponseswithagraphicaluserinterface.
Asetofready-to-useVelocitytemplatesiscombinedwithSolrresponsesinordertoprovideasearchGUIwithalotoffeatures(forexample,faceting,highlighting,andautocompletion).
TipYoucanfindtheVelocitytemplatesunderthesrc/solr/solr-home/example/conf/velocityfolderofthech7project,orundertheexample/solr/collection1/conf/velocityfolderoftheSolrdownloadbundle.
AsthisGUIisdirectlyprovidedbytransformingtheemergingSolrresponses,there’snoneedforanexternalwebapplicationtoexecutesearchesandgraphicallyseethecorrespondingresults.
Okay,onecouldnowsay,“ThisisalreadypossiblewiththeSolrRESTservices”,butthatisdefinitelymoretechnicallycomplexandthesearchresultsaredisplayedinXMLorJSONorwhateverformat.Here,amoreuser-friendlyinterfaceisprovided,asshowninthefollowingscreenshot:
www.it-ebooks.info
ThatmakesSolritasanidealchoicetobuildrapidprototypes.ThesampleinstanceyoustartedatthebeginningofthischapterhasSolritasconfiguredinsolrconfig.xml.Itrespondstothe/solritasendpoint,soafterindexingsomedatafromthepreviousparagraph,openyourbrowserandtypehttp://127.0.0.1:8983/solr/example/solritas.
TipTheVelocitytemplateshavebeencopiedfromtheSolrdownloadbundle,sosomeareas(suchasGoogleMapswidgets,spatialqueries,andrangequeries)mightnotbevisibleormightnotmakesensewiththechapter’ssampledata.Ifyouwanttoseealloftheminaction,juststarttheSolrexampleinthedownloadbundleandnavigatetohttp://127.0.0.1:8983/solr/browseaddress.
YoushouldseeSolritas’resultspage,whichispreloadedwitha*:*querybydefault.
www.it-ebooks.info
OtherextensionsThecontribfoldercontainsothermodulesorpluginsthatarebrieflydescribedinthefollowingsections.
www.it-ebooks.info
ClusteringTheclusteringmoduleisaframeworkusedtopluginthird-party(clustering)implementations.Atthetimeofwritingthisbook,itprovidessupportforclusteringsearchresultsusingtheCarrot2project.
TheSolrexamplethatcomeswiththedownloadbundlealreadycontainsaClusteringComponentwithinthesolrconfig.xmlconfigurationfile.Thedeclarationhappensintwophases.First,thecomponenthastobeconfigured:
<searchComponent
name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent">
<lstname="engine">
<strname="name">lingo</str>
<str
name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorit
hm</str>
<strname="carrot.resourcesDir">clustering/carrot2</str>
</lst>
…
</searchComponent>
Afterthis,aswithanyotherSearchComponent,youshouldenableitbyincludingitsnameintheRequestHandlerinstancewhereitissupposedtoplay:
<requestHandlername="/myRequestHandler"class="solr.SearchHandler">
…
<arrname="last-components">
<str>clustering</str>
</arr>
</requestHandler>
Inthisway,itcancontributetosearchresultsbyaddinga“clusters”section,likethis:
<response>
<result>
…
</result>
<arrname="clusters">
<arrname="labels">
<str>iPod</str>
</arr>
<doublename="score">1.3174612693376382</double>
<arrname="docs">
<str>F8V7067-APL-KIT</str>
<str>IW-02</str>
…
</arr>
<arrname="labels">
<str>HardDrive</str>
</arr>
…
</response>
www.it-ebooks.info
Ifyouwanttotrythisyourself,openashellandtypethefollowingcommands:
#cd$INSTALL_DIR/example
#java-Dsolr.clustering.enabled=true-jarstart.jar
ThesewillstartSolrwiththeClusteringComponentenabled.Now,onanothershelltypethis:
#cd$INSTALL_DIR/example/exampledocs
#./post.sh*.xml
Finally,openabrowserandexecutethisquery:http://localhost:8983/solr/clustering?q=*:*&rows=10
Youshouldgetaresponsesimilartotheprecedingexample,withthe“clusters”sectionatthebottom.
www.it-ebooks.info
UIMAMetadataExtractionLibraryThismoduleintegratesApacheUIMAinSolrbyprovidingapowerfulMetadataExtractionLibrarythatcanbeusedfortaskssuchasautomatickeywordextractionandNamedEntityRecognition(forexample,places,names,concepts,anddates).
TheplugincanbeprovidedbothasanUpdateRequestProcessorsubclass,todecoratetheindexprocesschain,orasasetofTokenizers/Filters,toaddsuchbehaviorinthe(indexorquery)textanalysisphase.
Usingthismodule,youcanenrichyourSolrdocumentswithadditionalmetadatainformationextractedfromtheinputdata.UIMAprovidesananalysisenginethatinvolvesseveralcomponentsarrangedinapipeline.ThedefaultpipelinesupportstheuseofexistinganalysisenginessuchasAlchemyorOpenCalais.Keepinmindthattheseenginesarenotfree-of-charge,buttheyprovideafreetrialperiod.YoucanregisterandobtainanAPIkeythatmustbeconfiguredinthesolrconfig.xmlfile.Othercomponentsareusedforlanguageandsentencedetection.
NoteUnderthecontrib/uimafolder,youwillfindaREADMEfilewithdetailedinformationabouttheSolrUIMAmoduleusage.
TheUIMAUpdateRequestProcessorinterceptsthedocumentsthatarebeingindexedandsendsthemtoitsanalysispipeline.Thosedocumentswillbeautomaticallyenrichedwithextractedinformationsuchassentences,languages,ornamedentities(forexample,placesornames).
www.it-ebooks.info
MapReduceTheMapReducecontribmoduleprovidesintegrationwithApacheHadoop.MapReduceisthenameofaparadigm(programmingmodel)thatisimplementedinApacheHadooptoprocesslargedatasetswithaparallelanddistributedalgorithm.
ThecontributioncontainsaMapReducejobtobuildSolrindexesandmergethemintoaSolrcluster.
www.it-ebooks.info
SummaryInthischapter,weillustratedasetofcontributionmodulesthatarenotpartoftheSolrcorebutdefinitelyusefulinalotofrealscenarios.TheSolrdownloadbundlecontainsallofthem,andtheirinstallationisveryeasy.EachmodulefolderhasaREADMEfilethatguidesyouthroughinstallationandsetupsteps(basically,it’sjustamatterofcopying,pasting,andconfiguring).
Inthenextchapter,wewillconcludeourSolrpathwithanoverviewabouttheSolrcodebase.Youwilllearnhowtoworkwithitandeventuallyhowtocontributetotheopensourcecommunityprocess.
www.it-ebooks.info
Chapter8.ContributingtoSolrAfriendofmineusedtosay,“Isthereabetterwaytostartanewyearthancontributingtoanopensourceproject?”Istronglyagree;agreatwaytogetinvolvedintheopensourceworldistocontributetotheprojectsyou’reusing.
Beingauserofanopensourcesoftware,youarealreadypartofthatworld—animportantpartthatmakesthatsoftwareuseful.Butthere’smore;youcandelvemoredeeplyintowhatactuallyhappensbehindthescenes.
Bytheendofthischapter,youwillhaveagoodunderstandingofthefollowingtopics:
TheconstituentpiecesoftheopensourceworldTheApachecontributionprocessHowtoworkwithSolrsourcecodeinyourIDE
www.it-ebooks.info
IdentifyingyourneedsWhyareyouinterestedintheopensourcecontributionprocess?WhydoyouwanttohavetheSolrsourcecodeinyourIDE?Thesearecrucialquestionsyoushouldanswerbeforedoingallthatisdescribedinthischapter.Inmyopinion,youcouldfallunderoneofthesescenarios:
Curiosity:Youwanttoinspectandseewithyoureyeshowthingsareworkingbehindthescenes.Bugfixing:YouwanttofixabugthatyoumetinyourSolrinstallation.Inthiswayyou,willsatisfyyourcustomerandthecommunitywillbenefitfromyourwork.Improvement:You’vegotanideaaboutaninterestingfeaturenotyetimplemented.Probably,acustomerrequirementledtothatidea,andyoubelievethatitcouldbeusefulforotherusersif(onceimplemented)itwouldbeintegratedinSolr.Wantingtocontribute:Yousimplywanttocontributebyfixinganexistingissueandparticipatinginthedevelopment/contributionprocess.
Whilecuriositycouldbeagoodreasontostartinvestigatingsourcecode,soonerorlater(andIwouldaddmostprobably),youwillfallintooneoftheothercategories.Atthattime,youwillnecessarilystartcommunicatingwithotherpeopleandthecommunitiesassociatedwiththeproject.
TipYoucanfindageneralintroductionabouttheApachecontributionprocessathttp://www.apache.org/foundation/getinvolved.html.
Thatinteractionwillinvolvesomegeneralaspectssuchasissuetracking,mailinglists,softwaredevelopment,andsoon.Onceyouhaveidentifiedyourneedsandgoals,youcanlookatupcomingsectionstogetadescriptionaboutthosecross-cuttingconcepts.
www.it-ebooks.info
Anexample–SOLR-3191In2013,IwasworkingonanOnlinePublicAccessCatalogue(OPAC)projectforabiglibrary.Theschemadefinitionbecamehugeverysoon,becausetheMARC,thestandardrepresentationforbibliographicrecords,isanoldandprovenstandardthatclassifieseachminimalpieceofinformationaboutacatalogitem.
Obviously,ourcustomerrequiredallthatrichnessinthesearchapplication,sowestartedwithasmallschemaandquicklyendedupwithalotoffields.
AnotherrequirementwasthecapabilitytodownloadeachiteminMARCXMLformat(MARCXMListheXMLrepresentationofaMARCrecord)intheenduserapplication.So,inordertosatisfythatrequirement,weputthewholeMARCrepresentationinadedicatedstoredfieldcalled,notsurprisingly,marc_xml.
Whatwastheproblem?OntheSolrside,wedefinedalotofSearchHandlerinstances,oneforeachkindofsearch(forexample,anykeyword,author,title,orsubject).Asyouknow,foreachhandlerwehavetodeclareall(stored)fieldsthatmustbeinthesearchresultsusingtheflparameter.
Inthefirstapproach,wesimplyputawildcard(*)asavaluefortheflparameter,asmostpartsofthosefieldswereneededintheuserinterface.Butafterithadbeenrunningforawhileinproduction,theITdepartment,inchargeofmonitoringthesystem,raisedanissueaboutthenetworktrafficbetweenthefrontendapplicationandtheSolrserver.Afterdoingsomeanalysis,wediscoveredalotofrecordswithahugemarc_xmlfieldreturnedtotheclient.“Ok,”saidoneoftheITguystous,“justexcludethemarc_xmlfieldfromtheflparameter”.
Theflparameteracceptsalistoffieldsthatmustbereturned,butthere’snowaytotellitwhatmustnotbeinthesearchresults.Eighthandlersweredefinedinthesolrconfig.xmlfile,andforeachofthem(later,wediscoveredtheXIncludefeature,butthat’sanotherstory),wehadtodeclareallstoredfields,excludingthemarc_xmlfield.Thiswasterribleandunmaintainable!
Aftergooglingabit,Ifoundseveralguysfacingthesameproblem,soIdecidedtotakealookatanexistingJIRAissue.Thus,Imetthe(unsolved)SOLR-3191issueathttps://issues.apache.org/jira/browse/SOLR-3191,whichdescribestheproblem:
SOLR-3191fieldexclusionfromfl
IthinkitwouldbeusefultoaddawaytoexcludefieldfromtheSolrresponse.IfIhaveforexample100storedfieldsandIwanttoreturnallofthembutone,itwouldbehandytolistjustthefieldIwanttoexcludeinsteadofthe99fieldsforinclusionthroughfl
SoIthoughttomyself:whydon’tyoutrytoimplementthatfeature?AndIdidwhatI’mgoingtodescribeinthischapter.Ifyoutakealookatthatissue,youwillseeIsubmittedtwopatchesandhadsomeexchangewithacoupleofSolrguys.
www.it-ebooks.info
SubscribingtomailinglistsIfyouhaven’tsubscribedtoaSolrmailinglist(orlists)yet,youshoulddothatbeforegoingahead.Useranddeveloperlistsaretheprimaryplacewherethingssuchasdoubts,questions,features,andbugsarediscussed.
It’smainlytherethatyoushouldlooktosolveyourproblemandmeetpeoplewithsimilarrequirements.LikeanyotherApacheproject,Solrhasthefollowingmailinglists:
Auserlist–solr-user@lucene.apache.orgAdevlist–dev@lucene.apache.orgAcommitslist–commits@lucene.apache.org
EverySolrusershouldbesubscribedtotheuserlist.Thisusuallyavoidstheneedtoreinventthewheelbygettingideasandsolutionsfromusersanddevelopers.
ThedevlistismeantforlisteningorparticipationindiscussionsonLuceneandSolrinternals,developments,upcomingfeatures,andsoon.Thefocushereismoretechnical.
Finally,thecommitslistisusedtoreceivenotificationsabouteverySolrorLucenecommit.
Subscribingtoalistisveryeasy;justsendanemptyemailtosolr-user-subscribe@lucene.apache.org,dev-subscribe@lucene.apache.org,orcommits-subscribe@lucene.apache.org,andthenfollowtheprocedurewrittenintheansweringmail.
www.it-ebooks.info
SigninguponJIRATheissuetrackerisanotherimportantbuildingblockoftheopensourcecontributionprocess.Wheneveranidea,question,bug,orfeaturebecomessomethingthatcouldaffectthecode,anewJIRAissueisfilled,andallthingsrelatedtothat(forexample,tasks,discussions,patches,code,andcommitlogs)willbeputthere.
IssuesinJIRAarepublic,soifyouwanttoonlyseeorreadthemthere’snoneedtohaveanaccount(youshouldhavealreadyreadtheSOLR-3191issueonJIRA,withouthavinganaccount).
However,ifyouwanttoparticipateinadiscussion,postapatch,orcreateorupdateissues,youmustsignupathttps://issues.apache.org/jira/secure/Signup!default.jspa.
Ultimately,youcansigninusingtheloginformathttps://issues.apache.org/jira/login.jsp.
That’sall!WelcometotheApacheIssueTracker!Notethat,beforeopeninganewissue,itisalwaysbettertopingthedevlistanddiscussit.Maybe,asimilarissuealreadyexistsandsomeoneisworkingonit.
www.it-ebooks.info
SettingupthedevelopmentenvironmentFollowingthesamelogicthatwasusedinthepreviouschapters,IwillassumeyouhaveEclipseinstalled.Ifthatisnotthecase,thatis,ifyoufollowedtheexamplesusingsomeotherIDE(forexample,IntelliJ),afewstepscouldbeabitdifferent.
Inordertobeabletomodify,build,andrunSolrfromthesourcecode,youneedthefollowing:
AnIDEsuchasEclipseorIntelliJASubversionclient,whichcanbeastandaloneclient(suchasthesvncommand-linetoolorTortoiseSVN)oraplugininyourIDE(forexample,SubclipseorSubversive)ApacheANT(http://ant.apache.org/bindownload.cgi)
www.it-ebooks.info
VersioncontrolSubversionisanopensourceversioncontrolsystemthatisusedtomaintainthesourcecodeoftheApacheprojects,includingSolr.
Asafirststep,youneedtocheckouttheSolrsourcecodefromtheSVNrepository.Dependingonyourrole,youshouldpointtooneofthefollowingaddresses:
http://svn.apache.org/repos/asf/lucene/dev/<branch>
https://svn.apache.org/repos/asf/lucene/dev/<branch>
Asyoucansee,theonlydifferenceintheprecedinglinksisintheprotocol.Thefirstlink,whichuseshttp,isforanonymouscheckout,andtheother,whichuseshttps,isforcommitters.Committersarethosepeoplewhohavecommitrights,thatis,activemembersofthedevelopmentcommunitywithwritepermissionsontherepository.Iassumeyoudon’tfallwithinthislastcategory,sothecorrectlinkisthefirst.
Thelinkalsocontainsa<branch>placeholder.Thismustbereplacedwiththecorrecttargetversionyouwillworkon.Thatstrictlydependsonthetaskyouwouldliketodo.Ifyouwanttofixabuginapastversion(forexample,4.7.2),youshouldpointtothecorrespondingbranch.Ifyouwanttopickupanexistingenhancementorbugthathasbeenscheduledforthenextmajorrelease,youshouldpointtothe“trunk”leg.Thefollowingtabledescribeshowtherepositorytreeisorganized(http://svn.apache.org/repos/asf/lucene/dev/):
Folder Description
branches Developmentbranches.
branches/branch_5x Thedevelopmentbranchforthenextversion,5.x.
…
branches/Lucene_solr_3_6
…
branches/Lucene_solr_4_10
Thedevelopmentbranchesforversionsthathavebeenreleased.Apartfromsometasksthathavebeenscheduledforagivenrelease,mostofthedevelopmentactivitiesdoneinthesebranchesarebugfixes.
tags
Whenanewversionisreleased,thecorrespondingsourcecodeiscopiedhere,inadedicatedfolder(forexample,tags/lucene_solr_3_6_1andtags/lucene_solr_4_10_3).
trunk Thisisthemaincenterofdevelopment.
Thetargetbranchdependsonwhatyouwouldliketodo.IfyoupickupanexistingJIRAamongitsattributes,youwillalsofindtheaffectedversion.Besides,youmaywanttofixanissueinanolderversion(forexample,3.6.1)becauseyourcustomerisusingthatspecificversion.
Keepinmindthatmostdevelopmenttasksaredoneinthetrunkandthenreportedtothecorrespondingactivedevelopmentbranch(underthebranchesfolder).Anyway,beforestarting,itisalwaysrecommendedtopingthedevlistexplainingwhatyouwanttodo.
www.it-ebooks.info
CodestyleOneofthecommonproblemsinadistributeddevelopmentistheagreementaboutsourcecodeformalisms:comments,namingconventions,andsoon.
That’sthereasontheSolrdevelopmentteamprovidedtwousefulconfigurationfiles—oneforEclipseandanotherforIntelliJ.ThesefilescanbeimportedtothoseIDEstoautomatealotofthingssuchasindentation,bracespositions,linewrapping,comments,andsoon.
Pickupthatfilefromoneofthefollowingaddresses,dependingonyourfavoriteIDE:
Eclipse:http://people.apache.org/~rmuir/Eclipse-Lucene-Codestyle.xmlIntelliJ:http://people.apache.org/~erick/Intellij-Lucene-Codestyle.xml
InEclipse,theconfigurationfilecanbeimportedbygoingtoWindow|Preferences|Java|CodeStyle|FormatterandthenclickingontheImportbutton,asshowninthefollowingscreenshot:
Afterthat,navigatetoJava|Editor|SaveActions.SelectthePerformtheselectedactionsonsavecheckboxandtheFormateditedlinesradiobutton,asshowninthisscreenshot:
www.it-ebooks.info
CheckingoutthecodeOnceyouhaveidentifiedthetargetbranchtoworkon,checkoutthesourcecodeusingthesvncommand-linetooloryourfavoritetool(forexample,TortoiseSVN).
SOLR-3191wasconsideredanewfeatureatthattime,soIcheckedoutthetrunk.ThecurrenttrunkrequiresJava8inordertobuildso,toexecutethestepsneededinthischapter,let’spointtoadifferentbranch(5_x).Openashellandtypethefollowingcommand:
#cd/work/solrdev
#svncheckout
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_5xsolr_5
Bearinmindthefollowing:
I’mnotacommitter,soIpointedtotheread-only(http)address.Thenameofthelocalfolderthatwillcontainthedownloadedsourceissolr_5.Ifitdoesn’texist,itwillbeautomaticallycreated.The/work/solrdev/solr_5folderisalocalworkingfolderonmymachine.Youcanchoosewhatevernameyoulike.
Whenyouexecutethatcommand,alotoffileswillbedownloaded.Intheend,youshouldseesomethinglikethis:
…
Asolr_5/solr/test-framework/src/java/overview.html
Asolr_5/.hgignore
Usolr_5
Checkedoutrevision1651057.
NowthesourcecodeofSolr5_xisinyourmachine.
www.it-ebooks.info
CreatingtheprojectinyourIDEGettingthesourcecodeisnotenough,unlessyouwanttodevelopyourpatchusingVim.YouwillhavetocreateaprojectinyourIDE.Assumingyouareinthe/work/solrdev/solr_5folderyoucreatedinthepreviousstep,typethefollowing:
#antcleantest
TheantcommandwillimmediatelyfailbecausethebuildrequiresIvy(adependencymanagementtool),andyoudon’thavethatonyourmachine.Noproblem!There’sadedicatedtaskthatcaninstallIvyforyou.Typethiscommand:
#antivy-bootstrap
Youshouldseesomethinglikethis:
…
ivy-bootstrap2:
ivy-checksum:
ivy-bootstrap:
BUILDSUCCESSFUL
Totaltime:3seconds
Nowwecanretrythefirstcommand:
#antcleantest
Thiswillexecutethewholetestsuite,whichisveryhuge,sotakealongcoffeebreak!
TipAlthoughthisstepisnotmandatory,itisstronglyrecommendedtocheckthestateofyourbuildbeforemakinganychange.Inthisway,youcanseewhetherthere’ssomethingfailing,somethingthatdoesn’thavetodowithyourchanges.
Oncethetestsuitehasbeenexecuted,typethiscommandifyouareusingEclipse:
#anteclipse
IfyouareusingIntelliJ,typethefollowingcommand:
#antidea
ThiswillgeneratetheIDEprojectfileswithinthecurrentdirectory(solr_5).Fromhereon,Iwillassumeyou’reusingEclipse,butthestepsarebasicallythesameforIntelliJ.
OpenEclipseandcreateanewworkspace(youcanalsousetheworkspacewhereyouloadedthesampleprojectsofthisbook).
OpentheFilemenuandchooseImport.Fromthedialogthatappears,gotoGeneral|ExistingProjectsintoWorkspace.UsingtheBrowsebutton,selectthe/work/solrdev/solr_5folder.PressOkandthenConfirm.Thedialogwillcloseandtheprojectwillbeimported,asshowninthisscreenshot:
www.it-ebooks.info
Oncetheprojecthasbeenbuilt,youshouldn’thaveanyerrors.Everythingisready,andyoucanproceedwithyourchange.
www.it-ebooks.info
MakingyourchangesWewon’tdigverydeepinthisstepbecauseitbasicallydependsonthenatureofthetaskyoupickedup.Forinstance,mySOLR-3191patchcontainsfourexistingclassesthatIchangedtoimplementthatspecificbehavior.
Sincenobodyknowsyouandyourchangeswillbehopefullyintegratedinaverypopularframework,themostimportantthingstokeepinmindareasfollows:
Correctness:Theimplementationmustdowhatitissupposedtodo,accordingtotherequirementsexpressedintheJIRAissueDocumentation:Javadocatclassandmethodlevels(don’tincludethe@authortag)Unittests:Thesedescribeandvalidateyourchanges
ReturningtotheSOLR-3191example,Ichangedtwoclasses:
org.apache.solr.search.ReturnFields
org.apache.solr.search.SolrReturnFields
Theseclassescontainthelogicrequiredbytheissue.Atthesametime,IupdatedtwoTestCaseclasseswithseveralunittestsdemonstratingandvalidatingmychanges:
org.apache.solr.search.ReturnFieldsTest
org.apache.solr.search.TestPseudoReturnFields
Duringdevelopment,it’sbettertoperiodicallyexecutethetestsuite,inordertoensurethatyourchangesdidn’tintroduceanyside-effect.
TipWhenworkinginadistributeddevelopmentenvironment,itisstronglyrecommendedyourunansvnupdatecommandfrequently.Inthisway,youwillalwaysbeworkingwiththelatestversionofthebranchyoucheckedout.
Okay,takeyourtimeandmakeyourchanges.RemembertopostamessageintheissuepageinJIRAforeveryrelevantdoubt.Inthisway,allofthehistoryofyourworkwillbeinoneplace.
www.it-ebooks.info
CreatingandsubmittingapatchOncetheimplementationhasbeencompleted,everythingisworking,andthetestsaregreen,it’stimetosubmitthepatch.
Beforedoingthat,openashellonthe/work/solrdev/solr_5workingfolderandtypethis:
#antprecommit
Thistaskwilllookforproblemsrelatedtotabindentation,authortags,andbrokenorwronglinksinjavadoc.Attheend,typethefollowingcommand:
#svnstat
Youwillseealistofsourcefilesthathavebeenchanged.Ifallofthemareassociatedwithyourchanges,justtypethiscommandinordertoincludetheminthepatch:
#svnstat|grep"^?"|awk'{print$2}'|xargssvnadd
Alternatively,youcanaddthosefilesonebyone,usingthefollowingcommand:
#svnadd<file>
Finally,typethiscommandtogenerateapatch:
#svndiff>/work/patches/SOLR-XXXX.patch
Thatwillcreateanewfile(SOLR-XXXX.patch)underthe/work/patcheslocalfolder.Hereareacoupleofthingstonote:
/work/patchesisasamplelocaldirectorythatI’vecreatedonmymachine.Youcanputthepatchinadifferentfolder.XXXXissupposedtobereplacedwiththenumberofthecorrespondingJIRAissue.Ifyouareupdatinganexistingpatch,thenameshouldalwaysfollowthisconventionbecauseJIRAwilltakecareofhighlightingthenewestversion.
TipIfyou’veinstalledanSVNpluginonyourIDE(suchasSubclipseorSubversiveinEclipse),youcandoeverythingwithoutusingthecommand-line.InSubclipse,forexample,there’saCreatePatchunderTeamthatwillguideyouthroughthenecessarystepswithaneasywizard.
Onceyou’vegotthepatchfile,openabrowser,logintoJIRA,gototheissuepage,anduploadthepatch.Itisrecommendedyoupostacommentwithinformation(includingadescription)aboutyoursubmission.That’sall!Nowyoushouldfollowyourissuebecauseseveralthingscanhappen:
Thepatchisperfect,soit’sjustamatteroftimeanditwillbeapplied.SomequestionscomefromJIRAusers.Inthatcase,youmaywanttoparticipateinadiscussionthatmighteventuallyrequestanewversionofthepatch.
www.it-ebooks.info
Anyway,thebigpartisdone!You’veactivelyparticipatedinthecontributionprocess,andhopefullyyourartifactwillbeintegratedwithSolr.Congrats!
www.it-ebooks.info
OtherwaystocontributeBesideswritingcode,thereareotherwaystoparticipateinanopensourceproject.Afterall,thesoftwareisjustacomponentofafinalproduct.Wecanfindsupportanddocumentation,whichinmostcasesmaketherealdifferencebetweenagoodandabadproductfromtheuser’sperspective.
www.it-ebooks.info
DocumentationSoftwarequalityisdescribedbyacombinationofseveralfactors:functionalandnon-functionalfeatures,internalandexternalqualities,andlastbutnotleast,documentation.
By“documentation”,Ipersonallymeanacomplexandhugeworldmadeupofdifferenttypesofinformationfordifferenttypesoftargetaudience:
Technicalinternaldocumentation:Strictlyneededbyactivedeveloperstoinformaboutthestructureortheimplementationofthesystem.Technicalexternaldocumentation:Crucialforopensourceprojectsrepresentingframeworks,thingsthatcanbeextended.Thisissometimescalledthedeveloperguide.ThiskindofinformationdocumentsthepublicAPIandtheextensionpointsthatletdevelopersintegratetheproductwiththeirapplications.Userdocumentation:Thisenablesenduserstounderstandtheusageandpowerofagivensystem.Itissometimescalledauserguideandistheprimarysourceofinformationforanenduser.
Solrhastwomainplaceswheredocumentationcanbefound:
Thereferenceguide,availableonlineathttps://cwiki.apache.org/confluence/display/solr/About+This+Guide,orinPDFformatTheSolrcommunityWiki,athttps://wiki.apache.org/solr
Thefirstisaguideconstitutingtheofficialreferencedocumentation.ItiscreatedandmaintainedbySolrcommitters.Ontheotherhand,theWikiisapublicandcollaborativetool.AnyonecanpotentiallyedititscontentbycreatinganaccountandthenrequestingwritegrantsfromtheSolrteam.Fordetailedinstructionsrefertohttp://wiki.apache.org/solr/#How_to_edit_this_Wiki.
www.it-ebooks.info
MailinglistmoderatorAlistmoderatorisakindofsupervisorforagivenmailinglistandauserwithelevatedprivileges.Hecangetalistofallsubscribersandmanuallysubscribeorunsubscribeagivenuser.
Hechecksemailssenttothelistfromaddressesthatarenotsubscribedinordertoimprovespamfilterrules.Healsohelpsuserswhofaceissuesrelatedwithlists(forexample,subscriptionandun-subscription).
www.it-ebooks.info
SummaryInthisfinalchapter,weillustratedtheoverallcontributionprocess.Beinganopensourceproject,theSolrteamwarmlywelcomesanykindofcontribution:sourcecode,bugfixing,documentation,andactiveparticipationinthemailinglists.There’snoneedtobeacommitter,whichwouldbesurelyanambitiousgoalforadeveloper.It’salwayspossibletodownloadthesourcecode,changeit,andeventually(ifyouthinkthechangescouldalsobeusefulforotherpeople)createapatchandsubmitittothecommunity.
www.it-ebooks.info
IndexA
addcommandabout/Addsending/Sendingaddcommands
addcommand,XMLformat<add>/AddcommitWithin/Addoverwrite/Add<doc>/Addboost/Add<field>/Add
Alchemyabout/UIMAMetadataExtractionLibrary
alternativequery/Alternativequeryanalyzersections/ThetextanalysisprocessApacheANT
URL/Settingupthedevelopmentenvironmentversioncontrol/Versioncontrol
ApachecontributionURL/Identifyingyourneeds
ApacheHadoopabout/MapReduce
ApachePOI/ContentExtractionLibraryApacheTikaframework/ContentExtractionLibraryApacheUIMA
about/UIMAMetadataExtractionLibraryApacheVelocity
about/RapidprototypingwithSolaritasApacheZookeeper
about/ClustermanagementURL/Clustermanagement
autocommitfeature/Updatehandlerandautocommitfeature
www.it-ebooks.info
Bbackgroundserver
Solr,runningas/DifferentwaystorunSolr,Backgroundserverbackup
about/Replicationfactor,leaders,andreplicasBooleanfields
about/BooleanBooleanparameters,servicebehavior
waitSearcher/Commit,optimize,androllbackwaitFlush/Commit,optimize,androllbacksoftCommit/Commit,optimize,androllback
Boostqueryparser/Otheravailableparsersbuilt-intransformers
ScriptTransformer/TransformersDateFormatTransformer/TransformersHTMLStripTransformer/TransformersLogTransformer/TransformersNumberFormatTransformer/TransformersRegexTransformer/TransformersTemplateTransformer/Transformers
www.it-ebooks.info
Ccache
about/CachesFilterCache/CachesQueryResultCache/CachesDocumentCache/CachesFieldCache/CachesFieldValueCache/CachesCustomCache/Cacheslifecycle/Cachelifecyclessizing/Cachesizingobjectslifecycle/CachedobjectlifecycleLRUCache/CachedobjectlifecycleFastLRUCache/CachedobjectlifecycleLFUCache/Cachedobjectlifecyclestats/Cachestatstypes/Typesofcache
cache,statslookups/Cachestatshits/Cachestatshitratio/Cachestatsinserts/Cachestatsevictions/Cachestatssize/CachestatswarmupTime/Cachestatscumulative_lookups/Cachestatscumulative_hits/Cachestatscumulative_hitratio/Cachestatscumulative_inserts/Cachestatscumulative_evictions/Cachestats
cache,typesfiltercache/Filtercachequeryresultcache/QueryResultcachedocumentcache/Documentcachefieldvaluecache/Fieldvaluecachecustomcache/Customcache
Carrot2projectabout/Clustering
changescreating/Makingyourchanges
charfilters/Charfiltersreferencelink/Charfilters
clusteringmodule
www.it-ebooks.info
about/ClusteringCollectionsAPI,actions
CREATE/CollectionsAPIRELOAD/CollectionsAPIDELETE/CollectionsAPILIST/CollectionsAPICREATESHARD/CollectionsAPISPLITSHARD/CollectionsAPIDELETESHARD/CollectionsAPICREATEALIAS/CollectionsAPIDELETEALIAS/CollectionsAPIADDREPLICA/CollectionsAPIDELETEREPLICA/CollectionsAPICLUSTERPROP/CollectionsAPIMIGRATE/CollectionsAPIADDROLE/CollectionsAPIREMOVEROLE/CollectionsAPIOVERSEERSTATUS/CollectionsAPICLUSTERSTATUS/CollectionsAPIREQUESTSTATUS/CollectionsAPIADDREPLICAPROP/CollectionsAPIDELETEREPLICAPROP/CollectionsAPIBALANCESHARDUNIQUE/CollectionsAPI
configurationparametersURL/ContentExtractionLibrary
ContentExtractionLibrary/ContentExtractionLibrarycopyfields/CopyfieldsCore
overview/CoreoverviewCoreAdmin
about/CoreAdmintoptoolbar/CoreAdmincentralarea/CoreAdmin
CoreAdmin,centralareastartTime/CoreAdmininstanceDir/CoreAdmindataDir/CoreAdminlastModified/CoreAdminversion/CoreAdminnumDocs/CoreAdminmaxDocs/CoreAdmindeletedDocs/CoreAdminoptimized/CoreAdmincurrent/CoreAdmin
www.it-ebooks.info
directory/CoreAdminCoreAdmin,toptoolbar
Unload/CoreAdminRename/CoreAdminSwap/CoreAdminReload/CoreAdminOptimize/CoreAdmin
customcache/Customcachecustomdata
indexing/Indexingcustomdatacustomresponsewriter
using/Usingacustomresponsewriter
www.it-ebooks.info
DDamerau-Levenshteindistancealgorithm/Fuzzydashboard
about/DashboardphysicalandJVMmemory/PhysicalandJVMmemorydisk/Diskusagefiledescriptors/Filedescriptors
databaserecordversusdocument/Thedocument
DataImportHandlermoduleabout/DataImportHandlerdatasources/Datasourcesentities/Documents,entities,andfieldsdocuments/Documents,entities,andfieldsfields/Documents,entities,andfieldstransformer/Transformersentityprocessors/Entityprocessorseventlisteners/Eventlisteners
datasourcesabout/DatasourcesJdbcDataSource/DatasourcesURLDataSource/DatasourcesBinURLDataSource/DatasourcesFileDataSource/DatasourcesBinFileDataSource/DatasourcesContentStreamDataSource/DatasourcesBinContentStreamDataSource/DatasourcesFieldReaderDataSource/DatasourcesFieldStreamDataSource/Datasources
dateformatabout/Date
defaultsimilarity/Defaultsimilaritydeletecommands
issuing/Deletedevelopmentenvironment
settingup/Settingupthedevelopmentenvironmentversioncontrol/Versioncontrolcodestyle/Codestylecode,checkingout/Checkingoutthecodeprojectcreating,inIDE/CreatingtheprojectinyourIDE
diamondarchitectureabout/Master/slavesscenario
Dis
www.it-ebooks.info
about/TheDisjunctionMaximumqueryparserMax/TheDisjunctionMaximumqueryparser
disjunctionmaxquery/Tiebreakerdisjunctionsumquery/TiebreakerDisMaxqueryparser
about/TheDisjunctionMaximumqueryparserqueryfields/QueryFieldsalternativequery/Alternativequeryminimumnumberofmatches/Minimumshouldmatchphrasefields/Phrasefieldsqueryphraseslop/Queryphraseslopphraseslop/Phraseslopboostqueries/Boostqueriesadditiveboostfunctions/Additiveboostfunctionstieparameter/Tiebreaker
Document/Inputandoutputdatatransferobjectsdocument
about/Thedocumentversusdatabaserecord/Thedocument
documentationabout/Documentationtechnicalinternaldocumentation/Documentationtechnicalexternaldocumentation/Documentationuserdocumentation/Documentation
documentcacheabout/Documentcache
documentsabout/Documents,entities,andfields
dynamicfields/Dynamicfields
www.it-ebooks.info
EEclipse
URL/CodestyleEclipseIDEforJavaDevelopers
URL/PrerequisiteseDisMaxqueryparser
about/TheExtendedDisjunctionMaximumqueryparserfieldedsearch/Fieldedsearchphrasebigramfield/Phrasebigramandtrigramfieldsphrasetrigramfield/Phrasebigramandtrigramfieldsphrasetrigramslop/Phrasebigramandtrigramslopphrasebigramslop/Phrasebigramandtrigramslopmultiplicativeboostfunction/Multiplicativeboostfunctionuserfields/Userfieldslowercaseoperators/Lowercaseoperators
ensembleabout/Clustermanagement
entitiesabout/Documents,entities,andfieldsrootentities/Documents,entities,andfieldssubentities/Documents,entities,andfields
EntityProcessorabout/Entityprocessors
entityprocessorsSqlEntityProcessor/EntityprocessorsFileListEntityProcessor/EntityprocessorsLineEntityProcessor/EntityprocessorsMailEntityProcessor/EntityprocessorsPlainTextEntityProcessor/EntityprocessorsSolrEntityProcessor/EntityprocessorsTikaEntityProcessor/EntityprocessorsXPathEntityProcessor/Entityprocessors
eventlistenersabout/Eventlisteners
extensionsabout/Otherextensionsclusteringmodule/ClusteringUIMAMetadataExtractionLibrary/UIMAMetadataExtractionLibraryMapReduce/MapReduce
www.it-ebooks.info
Ffacetcomponent
about/Facetfacetqueries/Facetqueriesfacetfields/Facetfieldsfacetranges/Facetrangespivotfacets/Pivotfacetsintervalfacets/Intervalfacets
facetedsearch/Facetfacetfields/Facetfields
facet.field/Facetfieldsfacet.prefix/Facetfieldsfacet.sort/Facetfieldsfacet.limit/Facetfieldsfacet.offset/Facetfieldsfacet.mincount/Facetfieldsfacet.missing/Facetfieldsfacet.method/Facetfieldsfacet.threads/Facetfields
facetqueries/Facetqueriesfacetranges
about/Facetrangesfacet.range/Facetrangesfacet.range.start/Facetrangesfacet.range.end/Facetrangesfacet.range.gap/Facetranges
facets/FacetFactoryclass/ChangingthestoredvalueoffieldsFastLRUCache/Cachedobjectlifecyclefastvectorhighlighter/Fastvectorhighlighterfieldedsearch/Fieldedsearchfieldlists/FieldlistsFieldqueryparser/Otheravailableparsersfields
about/Documents,entities,andfieldsfields,Solrschema
about/Fieldsstatic/Staticfieldsdynamic/Dynamicfieldscopy/Copyfields
fieldsattributes,Solrschemaname/Fieldstype/Fields
www.it-ebooks.info
indexed/Fieldsstored/Fieldsrequired/Fieldsdefault/FieldssortMissingFirst/FieldssortMissingLast/FieldsomitNorms/FieldsomitPositions/FieldsomitTermFreqAndPositions/FieldstermVectors/FieldsdocValues/Fields
fieldtypes,Solrschemaabout/Fieldtypestextanalysisprocess/Thetextanalysisprocesscharfilters/Charfilterstokenizer/Tokenizerstokenfilters/Tokenfiltersimplementing/Puttingitalltogetherreferencelink/Someexamplefieldtypes
fieldtypesattributes,Solrschemaname/Fieldtypestype/FieldtypessortMissingFirst/FieldtypessortMissingLast/Fieldtypesindexed/Fieldtypesstored/FieldtypesmultiValued/FieldtypesomitNorms/FieldtypesomitTermsAndFrequencyPositions/FieldtypesomitPositions/FieldtypespositionsIncrementGap/FieldtypesautogeneratePhraseQueries/Fieldtypescompressed/FieldtypescompressThreshold/Fieldtypes
fieldtypesexamples,Solrschemaabout/Someexamplefieldtypesstring/Stringnumeric/NumbersBooleanfields/Booleandate/Datetext/Textcurrency/Othertypesbinary/Othertypesgeospatialtypes/Othertypes
www.it-ebooks.info
random/Othertypesfieldvaluecache/Fieldvaluecachefiledescriptors/Filedescriptorsfiltercache
about/Filtercachefilterqueries/FilterqueriesFirstQueryITCaseintegrationtest/Integrationtestserverflparameter
about/FieldlistsFunctionqueryparser/Otheravailableparsersfuzzyquery/Fuzzy
www.it-ebooks.info
Hhardcommit/Updatehandlerandautocommitfeaturehighavailability
about/Replicationfactor,leaders,andreplicashighlightcomponent
about/Highlightingparameters/Highlightingstandardhighlighter/Standardhighlighterfastvectorhighlighter/Fastvectorhighlighterpostingshighlighter/Postingshighlighter
http/Versioncontrolhttps/Versioncontrol
www.it-ebooks.info
I<indexConfig>section,attributes
writeLockTimeout/IndexconfigurationmaxIndexingThreads/IndexconfigurationuseCompoundFile/IndexconfigurationramBufferSizeMB/IndexconfigurationramBufferSizeDocs/IndexconfigurationmergePolicy/IndexconfigurationmergeFactor/IndexconfigurationmergeScheduler/IndexconfigurationlockType/Indexconfiguration
IDEproject,creating/CreatingtheprojectinyourIDE
indexedfieldsabout/String
indexingconfigurationabout/Solrindexingconfiguration,Indexconfigurationgeneralsettings/Generalsettingsupdatehandler/Updatehandlerandautocommitfeatureautocommitfeature/UpdatehandlerandautocommitfeatureRequestHandler/RequestHandlerUpdateRequestProcessor/UpdateRequestProcessor
indexoperationsabout/Indexoperationsadd/Adddeletecommands,issuing/Deletecommit/Commit,optimize,androllbackoptimize/Commit,optimize,androllbackrollback/Commit,optimize,androllback
indexprocessextending/Extendingandcustomizingtheindexprocess
integrationtestserverSolr,runningas/DifferentwaystorunSolr,Integrationtestserver
IntelliJURL/Codestyle
intervalfacets/IntervalfacetsInverseDocumentFrequency(IDF)/Shardsinvertedindex
about/Theinvertedindex
www.it-ebooks.info
JJava
URL,fordownloading/PrerequisitesJavaDevelopmentKit7(JDK)/PrerequisitesJavaproperties
andthreaddump/JavapropertiesandthreaddumpJavaVirtualMachine(JVM)/PrerequisitesJConsole/JMXJIRA
signingup/SigninguponJIRAsigningup,URL/SigninguponJIRAloginform,URL/SigninguponJIRA
JMXabout/JMXURL/JMX
Joinqueryparser/OtheravailableparsersJVisualVM/JMXJVMmemory
andphysical/PhysicalandJVMmemoryJVMoptions
URL/PhysicalandJVMmemory
www.it-ebooks.info
Llanguageidentifier
about/LanguageIdentifierLFUCache/Cachedobjectlifecyclelistmoderator
about/Mailinglistmoderatorloadbalancing
about/Replicationfactor,leaders,andreplicaslogging
about/LoggingLRUCache/CachedobjectlifecycleLuceneindex/FiledescriptorsLucenequeryparser/Otheravailableparsers
www.it-ebooks.info
MM2Eclipse(M2E)/Prerequisitesmailinglists
subscribingto/SubscribingtomailinglistsManagementBeans(MBeans)/JMXMapReduce
about/MapReduceMARCXML/Anexample–SOLR-3191master/slavescenario
about/Master/slavesscenarioMavenCargoPlugin
URL/Understandingtheprojectstructuremorelikethissearchcomponent
about/Morelikethisparameters/Morelikethis
www.it-ebooks.info
N1*nrelationship/Documents,entities,andfieldsnumerictype
about/Numbers
www.it-ebooks.info
OOnlinePublicAccessCatalogue(OPAC)/Anexample–SOLR-3191OnlinePublicApplicationCatalogue(OPAC)/FieldsOpenCalais
about/UIMAMetadataExtractionLibraryoperators
AND/Terms,fields,andoperatorsOR/Terms,fields,andoperators+/Terms,fields,andoperators-/NOT/Terms,fields,andoperators
optimizeabout/Commit,optimize,androllback
www.it-ebooks.info
Ppatch
submitting/Creatingandsubmittingapatchcreating/Creatingandsubmittingapatch
PDFBox/ContentExtractionLibraryphrasefields/Phrasefieldspivotfacets/Pivotfacetspostingshighlighter/PostingshighlighterProcessorclass/Changingthestoredvalueoffieldsprojectstructure,Solrdevelopmentenvironment
about/Understandingtheprojectstructuresrc/main/java/Understandingtheprojectstructuresrc/main/resources/Understandingtheprojectstructuresrc/test/java/Understandingtheprojectstructuresrc/test/resources/Understandingtheprojectstructuresrc/dev/eclipse/Understandingtheprojectstructuresrc/solr-home/Understandingtheprojectstructurepom.xml/Understandingtheprojectstructure
www.it-ebooks.info
Qqueryanalyzers/Queryanalyzersqueryfields/QueryFieldsqueryhandlers
about/QueryhandlershandlerStartattribute/Queryhandlersrequestsattribute/Queryhandlerserrorsattribute/Queryhandlerstimeoutsattribute/QueryhandlerstotalTimeattribute/QueryhandlersavgRequestsPerSecondattribute/QueryhandlersavgTimePerRequestattribute/Queryhandlers
queryingabout/Queryingsearch-relatedconfiguration/Search-relatedconfigurationqueryanalyzers/Queryanalyzersqueryparameters/Commonqueryparameters
querylanguageabout/Querying
queryparametersabout/Commonqueryparameters,Queryparametersq/Commonqueryparametersstart/Commonqueryparametersrows/Commonqueryparameterssort/CommonqueryparametersdefType/Commonqueryparametersfl/Commonqueryparametersfq/Commonqueryparameterswt/CommonqueryparametersdebugQuery/CommonqueryparametersexplainOther/CommonqueryparameterstimeAllowed/Commonqueryparameterscache/CommonqueryparametersomitHeader/Commonqueryparametersfieldlists/Fieldlistsfilterqueries/Filterqueriesdefaults/Queryparametersappends/Queryparametersinvariants/Queryparameters
queryparserabout/QueryparsersSolrqueryparser/TheSolrqueryparserDisMaxqueryparser/TheDisjunctionMaximumqueryparser
www.it-ebooks.info
eDisMaxqueryparser/TheExtendedDisjunctionMaximumqueryparserqueryphraseslop/Queryphraseslopqueryresultcache
about/QueryResultcache
www.it-ebooks.info
Rrangesearches/Rangesrapidprototyping,Solaritas/RapidprototypingwithSolaritasRawqueryparser/OtheravailableparsersRealTimeGetHandler/RealTimeGetHandlerrepeater
about/Master/slavesscenarioreplica
about/Replicationfactor,leaders,andreplicasreplicationfactor
about/Replicationfactor,leaders,andreplicasreplicationmechanism
commit/Master/slavesscenariooptimize/Master/slavesscenariostartup/Master/slavesscenario
repositorytreeURL/Versioncontrol
RequestHandler/RequestHandlerresponseoutputwriters
about/Responseoutputwritersxml/Responseoutputwritersxslt/Responseoutputwritersjson/Responseoutputwriterscsv/Responseoutputwritersvelocity/Responseoutputwritersjavabin/Responseoutputwriterspython/Responseoutputwritersruby/Responseoutputwritersphp/Responseoutputwriters
rollback/Commit,optimize,androllbackroot-entities/Documents,entities,andfields
www.it-ebooks.info
Ssampleproject
about/Thesampleprojectschema.xmlfile/schema.xmlschemasections
about/Otherschemasectionsuniquekey/Uniquekeydefaultsimilarity/Defaultsimilarity
search-relatedconfigurationabout/Search-relatedconfigurationsettings/Search-relatedconfiguration
searchcomponentabout/Searchcomponentsquery/Queryfacet/Facethighlight/Highlightingmorelikethis/Morelikethisqueryelevation/Othercomponentsterms/Othercomponentsstats/Othercomponentsspellcheck/Othercomponentstermvector/Othercomponentsdebug/Othercomponents
searchcomponents/Searchcomponentssearchhandler
about/Searchhandlerstandardrequesthandler/StandardrequesthandlerRealTimeGetHandler/RealTimeGetHandler
shardsabout/ShardsURL/Shardsusing/Shardswithreplication/Shardswithreplication
size-estimator-lucene-solr.xlsURL/Prerequisites
softcommit/UpdatehandlerandautocommitfeatureSolidStateDisks(SSD)/DiskusageSolr
latestversion,downloading/DownloadingtherightversionURL,fordownloadbundle/Downloadingtherightversionserver,settingup/Settingupandrunningtheserverserver,running/Settingupandrunningtheserverrunning,asbackgroundserver/DifferentwaystorunSolr,Backgroundserver
www.it-ebooks.info
running,asintegrationtestserver/DifferentwaystorunSolr,Integrationtestserverabout/Whatdowehaveinstalled?,ExtendingSolrotherresources/Otherresourcesrealtimeandindexeddata,mixing/Mixingreal-timeandindexeddatacustomresponsewriter,using/Usingacustomresponsewriterdata,addingto/Addsanddeletesdata,deleting/Addsanddeletessearchingwith/Searchbindings/Otherbindingsrequirements,identifying/Identifyingyourneedsreferenceguide,URL/DocumentationURL/Documentation
Solr,clientsURL/Otherbindings
SOLR-3191about/Anexample–SOLR-3191URL/Anexample–SOLR-3191
solr-x.y.zdirectory/Settingupandrunningtheserversolr.xml/solr.xmlSolrCloud
about/SolrServer–theSolrfaçade,SolrCloudURL/SolrCloudclustermanagement/Clustermanagementreplicationfactor/Replicationfactor,leaders,andreplicasleaders/Replicationfactor,leaders,andreplicasreplicas/Replicationfactor,leaders,andreplicasdurability/Durabilityandrecoveryrecovery/Durabilityandrecoveryfeatures/Thenewterminologyadministrationconsole/AdministrationconsoleCollectionsAPI/CollectionsAPIdistributedsearch/Distributedsearchcluster-awareindex/Cluster-awareindex
SolrcommunityWikiURL/Documentation
solrconfig.xmlfile/solrconfig.xmlSolrcore
about/TheSolrcoreSolrdatamodel
about/UnderstandingtheSolrdatamodeldocument/Thedocumentinvertedindex/Theinvertedindex
Solrdevelopmentenvironment
www.it-ebooks.info
settingup/SettingupaSolrdevelopmentenvironmentprerequisites/Prerequisitessampleproject,importing/Importingthesampleprojectofthischapterprojectstructure/Understandingtheprojectstructure
Solrextension,GitHubURL/Commit,optimize,androllback
Solrhomeabout/Solrhome
Solrindex/FiledescriptorsSolritas
about/RapidprototypingwithSolaritasrapidprototyping/RapidprototypingwithSolaritas
Solrjabout/SolrjSolrServer/SolrServer–theSolrfaçadeinputdatatransferobject/Inputandoutputdatatransferobjectsoutputdatatransferobject/Inputandoutputdatatransferobjects
Solrqueryparserabout/TheSolrqueryparserterms/Terms,fields,andoperatorsfields/Terms,fields,andoperatorsoperators/Terms,fields,andoperatorsboosts/Boostswildcardcharacters/Wildcardsfuzzyquery/Fuzzyproximity/Proximityrangesearches/Ranges
Solrschemaabout/TheSolrschemafieldtypes/Fieldtypesfields/Fields
SolrServerabout/SolrServer–theSolrfaçadeEmbeddedSolrServer/SolrServer–theSolrfaçadeHttpSolrServer/SolrServer–theSolrfaçadeLBHttpSolrServer/SolrServer–theSolrfaçadeConcurrentUpdateSolrServer/SolrServer–theSolrfaçadeCloudSolrServer/SolrServer–theSolrfaçade
SolrsourcerepositoryURL/PhysicalandJVMmemory
sortfieldsabout/String
Spatialfilterqueryparser/OtheravailableparsersSQLEntityProcessor
www.it-ebooks.info
about/Entityprocessorsstandaloneinstance,ofSolr
about/StandaloneinstancestandaloneSolrinstance
installing/InstallingastandaloneSolrinstanceprerequisites/Prerequisites
standardhighlighter/Standardhighlighterstandardrequesthandler
about/Standardrequesthandlersearchcomponents/Searchcomponentsqueryparameters/Queryparameters
staticfields/Staticfieldsstoredvalue,offields
modifying/Changingthestoredvalueoffieldsstringtype
about/Stringindexedfields/Stringsortfields/String
sub-entities/Documents,entities,andfieldssubversion
about/VersioncontrolSurroundqueryparser/Otheravailableparsers
www.it-ebooks.info
Ttechnicalexternaldocumentation/Documentationtechnicalinternaldocumentation/Documentationterm
about/ThetextanalysisprocessTermqueryparser/Otheravailableparserstext
about/Texttextanalysisprocess
about/Thetextanalysisprocesspositionincrement/Thetextanalysisprocessstartandendoffset/Thetextanalysisprocesspayload/Thetextanalysisprocess
threaddumpandJavaproperties/Javapropertiesandthreaddump
thresholds,fortriggeringauto-commitsmaxDocs/UpdatehandlerandautocommitfeaturemaxTime/Updatehandlerandautocommitfeature
tieparameter/Tiebreakertokenfilters
about/Tokenfiltersreferencelink/Tokenfilters
tokenizerabout/Tokenizersreferencelink/Tokenizers
transformerabout/Transformers
transformersURL/Fieldlists
troubleshootingabout/Troubleshooting,TroubleshootingUnsupportedClassVersionErrorerror/UnsupportedClassVersionErrorfailedtoreadartifactdescriptor/The“Failedtoreadartifactdescriptor”messagemultivaluedfields/MultivaluedfieldsandthecopyFielddirectivecopyFielddirective/MultivaluedfieldsandthecopyFielddirective,RequiredfieldsandthecopyFielddirectivecopyFieldinputvalue/ThecopyFieldinputvaluerequiredfields/RequiredfieldsandthecopyFielddirectivestoredtext,immutable/Storedtextisimmutable!datanotindexed/Datanotindexed
troubleshooting,Solrabout/Troubleshooting,Noscoreisreturnedinresponse
www.it-ebooks.info
UUIMAMetadataExtractionLibrary/UIMAMetadataExtractionLibraryuniquekey/UniquekeyUnsupportedClassVersionErrorerror/UnsupportedClassVersionErrorupdatehandler/Updatehandlerandautocommitfeatureupdatehandlers
about/Updatehandlerscommitsattribute/UpdatehandlersautocommitmaxTimeattribute/Updatehandlersautocommitsattribute/Updatehandlerssoftautocommitsattribute/Updatehandlersoptimizesattribute/Updatehandlersrollbacksattribute/UpdatehandlersexpungeDeletesattribute/UpdatehandlersdocsPendingattribute/Updatehandlersaddsattribute/UpdatehandlersdeletesByIdattribute/UpdatehandlersdeletesByQueryattribute/Updatehandlerserrorsattribute/Updatehandlerscumulative_adds/Updatehandlerscumulative_deletesById/Updatehandlerscumulative_deletesByQuery/Updatehandlerscumulative_errors/Updatehandlers
UpdateRequestProcessor/UpdateRequestProcessoruserdocumentation/Documentationuserguide/Documentation
www.it-ebooks.info
top related