2.droppdf.com2.droppdf.com/files/qrtgk/apache-solr-essentials.pdf · table of contents apache solr...

Post on 15-Oct-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.it-ebooks.info

www.it-ebooks.info

ApacheSolrEssentials

www.it-ebooks.info

TableofContents

ApacheSolrEssentials

Credits

AbouttheAuthor

Acknowledgments

AbouttheReviewers

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmore

Whysubscribe?

FreeaccessforPacktaccountholders

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Errata

Piracy

Questions

1.GetMeUpandRunning

InstallingastandaloneSolrinstance

Prerequisites

Downloadingtherightversion

Settingupandrunningtheserver

SettingupaSolrdevelopmentenvironment

Prerequisites

Importingthesampleprojectofthischapter

Understandingtheprojectstructure

www.it-ebooks.info

DifferentwaystorunSolr

Backgroundserver

Integrationtestserver

Whatdowehaveinstalled?

Solrhome

solr.xml

schema.xml

solrconfig.xml

Otherresources

Troubleshooting

UnsupportedClassVersionError

The“Failedtoreadartifactdescriptor”message

Summary

2.IndexingYourData

UnderstandingtheSolrdatamodel

Thedocument

Theinvertedindex

TheSolrcore

TheSolrschema

Fieldtypes

Thetextanalysisprocess

Charfilters

Tokenizers

Tokenfilters

Puttingitalltogether

Someexamplefieldtypes

String

Numbers

Boolean

Date

Text

www.it-ebooks.info

Othertypes

Fields

Staticfields

Dynamicfields

Copyfields

Otherschemasections

Uniquekey

Defaultsimilarity

Solrindexingconfiguration

Generalsettings

Indexconfiguration

Updatehandlerandautocommitfeature

RequestHandler

UpdateRequestProcessor

Indexoperations

Add

Sendingaddcommands

Delete

Commit,optimize,androllback

Extendingandcustomizingtheindexprocess

Changingthestoredvalueoffields

Indexingcustomdata

Troubleshooting

MultivaluedfieldsandthecopyFielddirective

ThecopyFieldinputvalue

RequiredfieldsandthecopyFielddirective

Storedtextisimmutable!

Datanotindexed

Summary

3.SearchingYourData

Thesampleproject

www.it-ebooks.info

Querying

Search-relatedconfiguration

Queryanalyzers

Commonqueryparameters

Fieldlists

Filterqueries

Queryparsers

TheSolrqueryparser

Terms,fields,andoperators

Boosts

Wildcards

Fuzzy

Proximity

Ranges

TheDisjunctionMaximumqueryparser

QueryFields

Alternativequery

Minimumshouldmatch

Phrasefields

Queryphraseslop

Phraseslop

Boostqueries

Additiveboostfunctions

Tiebreaker

TheExtendedDisjunctionMaximumqueryparser

Fieldedsearch

Phrasebigramandtrigramfields

Phrasebigramandtrigramslop

Multiplicativeboostfunction

Userfields

Lowercaseoperators

www.it-ebooks.info

Otheravailableparsers

Searchcomponents

Query

Facet

Facetqueries

Facetfields

Facetranges

Pivotfacets

Intervalfacets

Highlighting

Standardhighlighter

Fastvectorhighlighter

Postingshighlighter

Morelikethis

Othercomponents

Searchhandler

Standardrequesthandler

Searchcomponents

Queryparameters

RealTimeGetHandler

Responseoutputwriters

ExtendingSolr

Mixingreal-timeandindexeddata

Usingacustomresponsewriter

Troubleshooting

Queriesdon’tmatchexpecteddocuments

Mismatchbetweenindexandqueryanalyzer

Noscoreisreturnedinresponse

Summary

4.ClientAPI

Solrj

www.it-ebooks.info

SolrServer–theSolrfaçade

Inputandoutputdatatransferobjects

Addsanddeletes

Search

Otherbindings

Summary

5.AdministeringandTuningSolr

Dashboard

PhysicalandJVMmemory

Diskusage

Filedescriptors

Logging

CoreAdmin

Javapropertiesandthreaddump

Coreoverview

Caches

Cachelifecycles

Cachesizing

Cachedobjectlifecycle

Cachestats

Typesofcache

Filtercache

QueryResultcache

Documentcache

Fieldvaluecache

Customcache

Queryhandlers

Updatehandlers

JMX

Summary

6.DeploymentScenarios

www.it-ebooks.info

Standaloneinstance

Shards

Master/slavesscenario

Shardswithreplication

SolrCloud

Clustermanagement

Replicationfactor,leaders,andreplicas

Durabilityandrecovery

Thenewterminology

Administrationconsole

CollectionsAPI

Distributedsearch

Cluster-awareindex

Summary

7.SolrExtensions

DataImportHandler

Datasources

Documents,entities,andfields

Transformers

Entityprocessors

Eventlisteners

ContentExtractionLibrary

LanguageIdentifier

RapidprototypingwithSolaritas

Otherextensions

Clustering

UIMAMetadataExtractionLibrary

MapReduce

Summary

8.ContributingtoSolr

Identifyingyourneeds

www.it-ebooks.info

Anexample–SOLR-3191

Subscribingtomailinglists

SigninguponJIRA

Settingupthedevelopmentenvironment

Versioncontrol

Codestyle

Checkingoutthecode

CreatingtheprojectinyourIDE

Makingyourchanges

Creatingandsubmittingapatch

Otherwaystocontribute

Documentation

Mailinglistmoderator

Summary

Index

www.it-ebooks.info

www.it-ebooks.info

ApacheSolrEssentials

www.it-ebooks.info

www.it-ebooks.info

ApacheSolrEssentialsCopyright©2015PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:February2015

Productionreference:1210215

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78439-964-1

www.packtpub.com

www.it-ebooks.info

www.it-ebooks.info

CreditsAuthor

AndreaGazzarini

Reviewers

AhmadMaherAbdelwhab

MarkusKlose

JulianLam

PuneetSinghLudu

CommissioningEditor

UshaIyer

AcquisitionEditor

LarissaPinto

ContentDevelopmentEditor

KirtiPatil

TechnicalEditor

AnkurGhiye

CopyEditor

VikrantPhadke

ProjectCoordinator

NidhiJ.Joshi

Proofreaders

StephenCopestake

MariaGould

BernadetteWatkins

Indexer

PriyaSane

Graphics

AbhinashSahu

ProductionCoordinator

ShantanuN.Zagade

CoverWork

www.it-ebooks.info

ShantanuN.Zagade

www.it-ebooks.info

www.it-ebooks.info

AbouttheAuthorAndreaGazzariniisasoftwareengineer.HehasmainlyfocusedontheJavatechnology.Althoughofteninvolvedinanalysisanddesign,hestronglylovescodinganddefinitelylikestobeconsideredadeveloper.

Andreahasmorethan15yearsofexperienceinvarioussoftwarebranches,fromtelecomtobankingsoftware.Hehasworkedforseveralmedium-andlarge-scalecompanies,suchasIBMandOrgaSystems.

AndreahasseveralcertificationsintheJavaprogramminglanguage(programmer,developer,webcomponentdeveloper,businesscomponentdeveloper,andJEEarchitect),BEAproducts(buildandportalsolutions),andApacheSolr(LucidApacheSolr/LuceneCertifiedDeveloper).

In2009,Andreasteppedintothewonderfulworldofopensourceprojects,andinthesameyear,hebecameacommitterfortheApacheQpidproject.HisadventurewithSolrbeganin2010,whenhejoined@Cult,anItaliancompanythatmainlyfocusesitsprojectsonlibrarymanagementsystems,onlineaccesspubliccatalogs,andlinkeddata.

He’scurrentlyinvolvedinseveral(toomany!)projects,alwaysthinkingabouta“big”ideathatwillchangehis(developer)life.

www.it-ebooks.info

www.it-ebooks.info

AcknowledgmentsI’dliketobeginbythankingthepeoplewhomadethisbookwhatitis.Writingabookisnotasingleperson’swork,andhelpfromexperiencedpeoplethatguideyoualongthepathiscrucial.ManythankstoLarissa,Kirti,Ankur,andVikrantforsupportingmeinthisprocess.

Iamalsogratefultothetechnicalreviewersofthebook,AhmadMaherAbdelwhab,MarkusKlose,PuneetSinghLudu,andJulianLam,forcarefullyreadingmydraftsandspotting(hopefully)mostofmymistakes.Thisbookwouldnothavebeensogoodwithouttheirhelpandinput.

Ingeneral,Iwanttothankeveryonewhodirectlyorindirectlyhelpedmeincreatingthisbook,exceptforalong-sightedteacherwhooncetoldmewhenIwasinuniversity,“Hey,guywithallthoseearrings!Youwon’tgoanywhere!”

Finally,aspecialthoughttomyfamily;tomygirls,theactualsupportersofthebook;mywonderfulwife,Nicoletta(towhomIpromisenottowriteanotherbook),myprideandjoy,SofiaandCaterina,andmyfirstactualteacher—mymom,Lina.TheyarethepeoplewhoreallymadesacrificeswhileIwaswritingandwhodefinitelydeservethecreditsforthebook.

Onceagain,thankyou!

www.it-ebooks.info

www.it-ebooks.info

AbouttheReviewersAhmadMaherAbdelwhabiscurrentlyworkingatKnowledgewareTechnologiesasanopensourcedeveloper.Hehasover10yearsofexperience,withspecialdevelopmentskillsinPHP,Drupal,Perl,RubyOnRails,Java,XML,XSL,MySQL,PostgreSQL,MongoDB,SQL,andLinux.HegraduatedincomputersciencefromMansouraUniversityin2005.

Iwouldliketothankmyfather,mother,andsincerewifefortheircontinuoussupportwhilereviewingthisbook.

MarkusKloseisasearchandbigdataconsultantatSHIGmbH&Co.KGinGermany.Heisinchargeofprojectmanagementandsupervision,projectanalysis,anddeliveringconsultingandtrainingservices.

MostofMarkus’dailybusinessisrelatedtoApacheSolr,Elasticsearch,andFastESP.HetravelsacrossGermany,Switzerland,andAustriatoprovidehisservicesandknowledge.

Onaregularbasis,youcanfindhimatmeets,usergroups,orconferencessuchasBerlinBuzzwordoderSolrRevolution,wherehespeaksaboutApacheSolr.

Besidessearch-relatedtrainingandconsulting,heiscurrentlyestablishingadditionalareasofwork.HeusestoolssuchasLogstashandKibanatofulfillcustomerrequirementsinmonitoringandanalytics.

Thankstotheexperiencegainedfromhisdailywork,MarkuswrotethefirstGermanbookonApacheSolr(EinführunginApacheSolr)withhiscolleague,DanielWrigley.ItwaspublishedbyO’ReillyinFebruary2014.

Besideswriting,MarkusspendsalotofhisfreetimeusinghisknowledgeandprogrammingskillstoworkonandcontributetoopensourceprojectssuchasLatinstemmerandnumberconverterforSolr(https://issues.apache.org/jira/browse/LUCENE-4229)andSolrAppenderforlog4j2(https://issues.apache.org/jira/browse/LOG4J2-618).

JulianLamisacofounderandcoremaintainerofNodeBB,atypeoffreeandopensourceforumsoftwarebuiltuponmodernwebtools,suchasNode.jsandRedis.HehasspokenseveraltimesontopicsrelatedtoJavascriptintheworkplaceandbestpracticesforhiring.Julianisanadvocateofclient-siderendering,whichcanbeusedtobuildhighlyperformantwebapplications.

www.it-ebooks.info

www.it-ebooks.info

www.PacktPub.com

www.it-ebooks.info

Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.

DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<service@packtpub.com>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

www.it-ebooks.info

Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

www.it-ebooks.info

FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.

HiDad,whenyouboughtmemyfirstcomputer,youhadnoideawhatwascomingnext…

www.it-ebooks.info

www.it-ebooks.info

PrefaceAsyoumayhaveguessedfromthetitle,thisisabookaboutApacheSolr—specificallyaboutSolressentials.WhatdoImeanbyessentials?Nicequestion!Suchatermcanbeseenfromsomanyperspectives.Solr,mainlyfrom2010onwards,witnessedexponentialgrowthintermsofpopularity,stakeholders,community,andthecapabilitiesitoffers.Thisrapidgrowthreflectstherichportfolioofthethingsthathavebeendevelopedintheseyearsandarenowadaysavailable.So,strictlyspeaking,it’snotsoeasytodefinethe“essentials”ofSolr.

TheperspectivethatIwillusetoexplaintheterm“essentials”isquitesimpleandpragmatic.IwilldescribethebuildingblocksofApacheSolr,andatthesametime,Iwilltrytoputmypersonalexperienceonthosetopics.Inrecentyears,I’veworkedwithSolrinseveralprojects.Asauser,Ihadtolearnhowtoinstall,configure,tune,troubleshoot,andmonitorSolr.Asadeveloper,thingsweredifferentforme.Ifyou’reworkingintheITdomainandyou’rereadingthisbook(Iguessyouare),youprobablyknowthateachtimeyoutrytoimplementasolution,there’ssomethingintheprojectthataspecifictooldoesn’tcover.So,afterspendingalotoftimeanalyzing,readingdocumentation,searchingontheInternet,readingWikis,andsoon,yourealizethatyouneedtoaddacustompieceofcodesomewhere.That’sbecause“theproductcoversthe99.9999percentofthepossiblescenariosbut…”Forthisspecificcase,ifthishappensorthathappens,youalwaysfallunderthat0.0001percent.Idon’tknowaboutyou,butforme,thishasalwaysbeenso.Nomatterwhattheproject,thecompany,ortheteamis,thishasbeenanimplicitconstantofeveryproject,always.

That’sthereasonIwilltryasmuchaspossibletoexplainthingsthroughoutthebookusingreal-worldexamplesdirectlycomingfrommypersonalexperience.Ihopethisadditionalperspectivewillbeusefulforbetterunderstandingofwhatisconsideredthemostpopularopensourcesearchplatform.

www.it-ebooks.info

WhatthisbookcoversChapter1,GetMeUpandRunning,introducesthebasicconceptsofSolranditprovidesyouwithallthenecessarystepstoquicklygetitupandrunning.

Chapter2,IndexingYourData,beginsourfirstdetaileddiscussiononSolr.Inthischapter,welookatthedataindexingprocessandseehowitcanbeconfigured,tuned,andcustomized.Thisisalsowhereweencounterthefirstlineofcode.

Chapter3,SearchingYourData,explorestheotherspecularsideofSolr.First,westoredourdata;nowweexploreallthatSolroffersintermsofsearchservices.

Chapter4,ClientAPI,coversclient-sideusageofSolrlibraries,providingadescriptionofthemainusecasesfromaclient’sperspective.

Chapter5,AdministeringandTuningSolr,takesyouthroughtheavailabletoolsforconfiguring,managing,andtuningSolr.

Chapter6,DeploymentScenarios,illustratesthevariouswaysinwhichyoucandeploySolr,fromastandaloneinstancetoadistributedcluster.

Chapter7,SolrExtensions,describesseveralavailableSolrextensionsandhowtheycanbeusefulinsolvingcommonconcreteusecases.

Chapter8,ContributingtoSolr,explainsthewonderfulworldofopensourcesoftwarebyillustratingthecompoundingpiecesoftheprocessofparticipationandcontribution.

www.it-ebooks.info

www.it-ebooks.info

WhatyouneedforthisbookInordertobeabletorunthecodeexamplesinthebook,youwillneedtheJavaDevelopmentKit(JDK)1.7andApacheMaven.

Alternatively,youwillneedanIntegratedDevelopmentEnvironment(IDE).EclipseisstronglyrecommendedasitisthesameenvironmentIusedtocapturethescreenshots.However,evenifyouwanttouseanotherIDE,thestepsshouldbequitesimilar.

Thedifferencebetweenthetwoalternativesmainlyresidesintherolethatyouwanttoassumeduringthereading.Whileyoumaywanttoonlystartandexecutetheexamplesasauser,youwouldsurelywanttoseetheworkingcodeinausableenvironmentasadeveloper.That’sthereasonanIDEisstronglyrecommendedinthesecondcase.

Thefirstchapterwillprovidetheinstructionsnecessaryforinstallingallthatyou’llneedthroughthebook.

www.it-ebooks.info

www.it-ebooks.info

WhothisbookisforThisbookistargetedatpeople—usersanddevelopers—whoarenewtoApacheSolrorareexperiencedwithasimilarproduct.ThebookwillgraduallyhelpyoutounderstandthefocalconceptsofSolrwiththehelpofpracticaltipsandreal-worldusecases.Althoughalltheexamplesassociatedwiththebookcanbeexecutedwithafewsimplecommands,afamiliaritywiththeJavaprogramminglanguageisrequiredforagoodunderstanding.

www.it-ebooks.info

www.it-ebooks.info

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandexplanationsoftheirmeanings.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Eachfolderhasasubfoldercalledconfwheretheconfigurationforthatspecificcoreresides.”

Ablockofcodeissetasfollows:

{

{"id":1,"title":"TheBirthdayConcert"},

{"id":2,"title":"LiveinItaly"},

{"id":3,"title":"LiveinPaderborn"},

}

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:

<filterclass="solr.LowerCaseFilterFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"

ignoreCase="true"/>

Anycommand-lineinputoroutputiswrittenasfollows:

#mvncargo:run–PfieldAnalysis

Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“Chooseafieldtypeorafield.ThenpresstheAnalyseValuesbutton.”

NoteWarningsorimportantnotesappearinaboxlikethis.

TipTipsandtricksappearlikethis.

www.it-ebooks.info

www.it-ebooks.info

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<feedback@packtpub.com>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

www.it-ebooks.info

www.it-ebooks.info

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

www.it-ebooks.info

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Alternatively,youcanalsodownloadtheexamplesfromGitHub,onhttps://github.com/agazzarini/apache-solr-essentials.There,youcandownloadthewholecontentasazipfilefromhttps://github.com/agazzarini/apache-solr-essentials/archive/master.zipor,ifyouhavegitinstalledonyourmachine,youcanclonetherepositorybyissuingthefollowingcommand:

#gitclone

https://github.com/agazzarini/apache-solr-essentials.git

<path-to-your-work-dir>

Where,<path-to-your-work-dir>isthedestinationfolderwheretheprojectwillbecloned.

www.it-ebooks.info

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

www.it-ebooks.info

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<copyright@packtpub.com>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

www.it-ebooks.info

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<questions@packtpub.com>,andwewilldoourbesttoaddresstheproblem.

www.it-ebooks.info

www.it-ebooks.info

Chapter1.GetMeUpandRunningThischapterdescribeshowtoinstallSolrandfocusesonalltherequiredstepstogetacompletestudyanddevelopmentenvironmentthatwillguideusthroughthebook.

Specifically,accordingtothedoubleperspectivepreviouslydescribed,Iwillillustratetwokindsofinstallations.ThefirstistheinstallationofastandaloneSolrinstance(thisisveryquick).Thisisasimpletaskbecausethedownloadbundleispreconfiguredwithallthatyouneedtogetyourfirsttasteoftheproduct.Asadeveloper,thesecondperspectiveiswhatIreallyneedeverydayinmyordinaryjob—aworkingintegrateddevelopmentenvironmentwhereIcanrunanddebugSolrwithmyconfigurationsandcustomizations,withouthavingtomanageanexternalserver.Ingeneral,suchanenvironmentwillhaveallthatIneedinoneplacefordeveloping,debugging,andrunningunitandintegrationtests.

Bytheendofthechapter,youwillhavearunningSolrinstanceonyourmachine,aready-to-useIntegratedDevelopmentEnvironment(IDE),andagoodunderstandingofsomebasicconcepts.

Thischapterwillcoverthefollowingtopics:

Installationofasimple,standaloneSolrinstancefromscratchSettingupofanIntegratedDevelopmentEnvironmentAquickoverviewaboutwhatweinstalledTroubleshooting

www.it-ebooks.info

InstallingastandaloneSolrinstanceSolrisavailablefordownloadasanarchivethat,onceuncompressed,containsafullyworkinginstancewithinaJettyservletengine.Sothestepshereshouldbeprettyeasy.

www.it-ebooks.info

PrerequisitesInthissection,wewilldescribeacoupleofprerequisitesforthemachinewhereSolrneedstobeinstalled.

Firstofall,Java6or7isrequired:theexactchoicedependsonwhichversionofSolryouwanttoinstall.Ingeneral,regardlessoftheversion,makesureyouhavethelatestupdateofyourJavaVirtualMachine(JVM).ThefollowingtabledescribestheassociationbetweenthelatestSolrandJavaversions:

Solrversion Javaversion

4.7.x Java6orgreater

4.8.x Java7(update55)orgreater;Java8isverifiedtobecompatible

4.9.x Java7(update55)orgreater;Java8isverifiedtobecompatible

4.10.x Java7(update55)orgreater

Javacanbedownloadedfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.

OtherfactorssuchasCPU,RAM,anddiskspacestronglydependonwhatyouaregoingtodowiththisSolrinstallation.Nowadays,itshouldn’tbehardtohaveacoupleofGBavailableonyourworkstation.However,bearinmindthatatthismomentI’mplayingonSolr4.9.0installedonaRaspberryPI(itsRAMis512MB).IgaveSolramaximumheap(-Xmx)of256MB,indexedabout500documents,andexecutedsomequerieswithoutanyproblem.Butagain,thosefactorsreallydependonwhatyouwanttodo:wecouldsaythat,assumingyou’reusingamodernPCforastudyinstance,hardwareresourcesshouldn’tbeaproblem.

Instead,ifyouareplanningaSolrinstallationinatestorinaproductionenvironment,youcanfindausefulspreadsheetathttps://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls.

Althoughitcannotencompassallthepeculiaritiesofyourenvironment,itisdefinitelyagoodstartingpointforRAManddiskspaceestimation.

www.it-ebooks.info

DownloadingtherightversionThelatestversionofSolratthetimeofwritingis4.10.3,butalotofthingswewilldiscussinthebookarevalidforpreviousversionsaswell.

YoumightalreadyhaveSolrsomewhereandmightnotwanttoredownloadanotherinstance,yourcustomermightalreadyhaveapreviousversion,or,ingeneral,youmightnotwantthelatestversion.Therefore,Iwilltrytorefertoseveralversionsinthebook—from4.7.xto4.10.x—asoftenaspossible.Eachtimeafeatureisdescribed,Iwillindicatetheversionwhereitappearedfirst.

Thedownloadbundleisusuallyavailableasatgzorziparchive.Youcanfindthatathttps://lucene.apache.org/solr/downloads.html.

www.it-ebooks.info

SettingupandrunningtheserverOncetheSolrbundlehasbeendownloaded,extractitinafolder.Wewillrefertothatfolderas$INSTALL_DIR.TypethefollowingcommandtoextracttheSolrbundle:

#tar-xvf$DOWNLOAD_DIR/solr-x.y.z.tar.gz-C$INSTALL_DIR

or

#unzip$DOWNLOAD_DIR/solr-x.y.z.zip-d$INSTALL_DIR

dependingontheformatofthebundle.

Attheend,youwillfindanewsolr-x.y.zfolderinyour$INSTALL_DIRfolder.ThisfolderwillactasacontainerforallSolrinstancesyoumaywanttoplaywith.Hereisascreenshotofthesolr-x.y.zfolderonmymachine,whereyoucanseeIhavethreeSolrversions:

Thesolr-x.y.zdirectorycontainsJetty,afastandsmallservletengine,withSolralreadydeployedinside.So,inordertostartSolr,weneedtostartJetty.Openanewshellandtype

www.it-ebooks.info

thefollowingcommands:

#cd$INSTALL_DIR/solr-x.y.z/example

#java-jarstart.jar

Youshouldseealotoflogmessagesendingwithsomethinglikethis:

...

[INFO]org.eclipse.jetty.server.AbstractConnector–Started

SocketConnector@0.0.0.0:8983

...

[INFO]org.apache.solr.core.SolrCore–[collection1]Registerednew

searcherSearcher@66b664d7[collection1]

main{StandardDirectoryReader(segments_2:3:nrt_0(4.9):C32)}

ThesemessagestellyouSolrisup-and-running!Openawebbrowserandtypehttp://127.0.0.1:8983/solr.

Youshouldseethefollowingpage:

ThisistheSolradministrationconsole.

www.it-ebooks.info

www.it-ebooks.info

SettingupaSolrdevelopmentenvironmentThissectionwillguideyouthroughthenecessarystepstohaveaworkingdevelopmentenvironmentthatallowsyoutohaveaplacetowriteandexecuteyourcodeorconfigurationsagainstarunninganddebuggableSolrinstance.

Ifyouaren’tinterestedinsuchaperspectivebecause,forinstance,yourusagescenariofallswithintheprevioussection,youcansafelyskipthisandproceedwiththenextsection.

Thesourcecodeincludedwiththisbookcontainsaready-to-useprojectforthissection.Iwilllaterexplainhowtogetitintoyourworkspaceinoneshot.

www.it-ebooks.info

PrerequisitesThedevelopmentworkstationneedstohavesomesoftware.Asyoucansee,Ikeptthelistsmallandminimal.

Firstly,youneedtheJavaDevelopmentKit7(JDK),ofwhichIrecommendthelatestupdate,althoughtheolderversionofSolrcoveredbythisbook(4.7.x)isabletorunwithJava6.Java7issupportedfrom4.7.xto4.10.x,soitisdefinitelyarecommendedchoice.

Lastly,weneedanIDE.Specifically,IwilluseEclipsetoillustrateanddescribethedeveloperperspective,soyoushoulddownloadarecentJSEversion(thatis,EclipseIDEforJavaDevelopers)fromhttps://www.eclipse.org/downloads.

NoteDonotdownloadtheEEversionofEclipsebecauseitcontainsalotofthingswedon’tneedinthisbook.

StartingfromEclipseJuno,alltherequiredpluginsarealreadyincluded.However,ifyouloveanolderversionofEclipse(suchasIndigo)likeIdo,thenMavenintegrationforEclipse—alsoknownasM2Eclipse(M2E)—needstobeinstalled.YoucanfindthisintheEclipsemarketplace(gotoHelp|EclipseMarketplace,thensearchform2e,andclickontheInstallbutton).

www.it-ebooks.info

ImportingthesampleprojectofthischapterIt’stimetoseesomecode,inordertotouchthingswithyourhands.WewillguideyouthroughthenecessarystepstohaveyourEclipseconfiguredwithasampleproject,whereyouwillbeabletostart,stop,anddebugSolrwithyourcode.

First,youhavetoimporttoEclipsethesampleprojectinyourlocalch1folder.Iassumeyoualreadygotthesourcecodefromthepublisher’swebsiteorfromGithub,asdescribedinthePreface.OpenEclipse,createanewworkspace,andgotoFile|Import|Maven|ExistingMavenProjects.

TipDownloadingtheexamplecode

Youcandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Alternatively,youcanalsodownloadtheexamplesfromGitHub,onhttps://github.com/agazzarini/apache-solr-essentials.There,youcandownloadthewholecontentasazipfilefromhttps://github.com/agazzarini/apache-solr-essentials/archive/master.zipor,ifyouhavegitinstalledonyourmachine,youcanclonetherepositorybyissuingthefollowingcommand:

#gitclonehttps://github.com/agazzarini/apache-solr-essentials.git<path-

to-your-work-dir>

Where<path-to-your-work-dir>isthedestinationfolderwheretheprojectwillbecloned.

Inthedialogboxthatappears,selectthech1folderandclickontheFinishbutton.EclipsewilldetecttheMavenlayoutofthatfolderandwillcreateanewprojectonyourworkspace,asillustratedinthefollowingscreenshot(ProjectExplorerview):

www.it-ebooks.info

www.it-ebooks.info

UnderstandingtheprojectstructureTheprojectyou’veimportedisverysimpleandcontainsjustfewlinesofcode,butitisusefulforintroducingsomecommonconceptsthatwillguideusthroughthebook(theotherchaptersuseexampleswithasimilarstructure).

Thefollowingtableshowsthestructureoftheproject:

FolderorFile Description

src/main/java

Themainsourcefolder.Itisemptyatthemoment,butitwillcontaintheSolrextensions(anddependentclasses)youwanttoimplement.Youwon’tfindthisdirectoryinthisfirstprojectbecausewedon’thavethesourcefilesyet.

src/main/resourcesThiscontainsprojectresourcessuchaspropertiesandconfigurationfiles.Youwon’tfindthisdirectoryinthisfirstprojectbecausewedon’thaveanyresourcesyet.

src/test/javaThissourcefoldercontainsUnitandIntegrationtests.Forthisfirstproject,youwillfindasingleintegrationtesthere.

src/test/resourcesThiscontainstestresourcessuchaspropertiesandconfigurationfiles.Itincludesasampleloggingconfiguration(log4j.xml).

src/dev/eclipse PreconfiguredEclipselaunchersusedtorunSolrandtheexamplesintheproject.

src/solr-home ThiscontainstheSolrconfigurationfiles.Wewilldescribethecontentofthisdirectorylater.

pom.xmlThisistheMavenProjectdefinition.Here,youcanconfigureanyfeatureofyourproject,includingdependencies,properties,andsoon.

WithintheMavenprojectdefinition(thatis,pom.xml),youcandoalotofthings.Forourpurposesrightnow,itisimportanttounderlinethepluginsection,whereyoucanseetheMavenCargoPlugin(http://cargo.codehaus.org/Maven2+plugin)configuredtorunanembeddedJetty7containeranddeploySolr.Here’sascreenshotthatshowstheCargoPluginconfigurationsection:

www.it-ebooks.info

IfyouhavetheBuildautomaticallyflagset(thedefaultbehaviorinEclipse),mostprobablyEclipsehasalreadydownloadedalltherequireddependencies.ThisisoneofthegreatthingsaboutApacheMaven.

So,assumingthatyouhavenoerrors,it’snowtimetostartSolr.ButwhereisSolr?

Thefirstquestionthatprobablycomestomindis:“Ididn’tdownloadSolr!Whereisit?”TheanswerisstillApacheMaven,whichisdefinitelyagreatopensourcetoolforsoftwaremanagementandsomethingthatsimplifiesyourlife.

MavenisalreadyincludedinyourEclipse(bymeansofthem2eplugin),andtheprojectyoupreviouslyimportedisafullycompliantMavenproject.

Sodon’tworry!WhenwestartaMavenbuild,Solrwillbedownloadedautomatically.Butwhere?InyourlocalMavenrepository,andyoudon’tneedtoconcernyourselfwiththat.

NoteWithinthepom.xmlfile,youwillfindaproperty,<solr.version>,withaspecificvalue.Ifyouwanttouseadifferentversion,justchangethevalueofthisproperty.

www.it-ebooks.info

DifferentwaystorunSolrIt’stimetostartSolrinyourIDEforthefirsttimebut,priortothat,it’simportanttodistinguishthetwowaystorunSolr:

Backgroundserver:Asabackgroundserver,sothatyoucanstartandstopSolrfordebuggingpurposesIntegrationtestserver:AsanintegrationtestserversothatyoucanhaveadedicatedSolrinstancetorunyourintegrationtestssuite

BackgroundserverThefirstthingyouwillneedinyourIDEisaserverinstancethatyoucanstart,stop,and(ingeneral)managewithafewsimplecommands.

Inthisway,youwillbeabletohaveSolrrunningwithyourconfigurations.Youcanindexyourdataandexecutequeriesinorderto(manually)ensurethatthingsareworkingasexpected.

Togetthistypeofserver,followtheseinstructions:

1. Right-clickontheprojectandcreateanewMaven(Debug)launchconfiguration(DebugAs|Mavenbuild…).

2. Inthedialog,typecargo:runintheGoalstextfield.3. Next,clickontheDebugbuttonasshowninthefollowingscreenshot:

Theveryfirsttimeyourunthiscommand,Mavenwilldownloadalltherequireddependenciesandplugins,includingSolr.Attheend,itwillstartanembeddedJettyinstance.

www.it-ebooks.info

NoteWhyaDebuginsteadofaRunconfiguration?

YoumustuseaDebugconfigurationsothatyouwillbeabletostoptheserverbysimplypressingtheredbuttonontheEclipseconsole.Runconfigurationshaveanannoyinghabit:Eclipsewillsaytheprocessisstopped,butJettywillbestillrunning,oftenleavinganorphanprocess.

YoushouldseethefollowingoutputintheEclipseconsole:

[INFO]------------------------------------------------------------

[INFO]BuildingChapter1Project1.0

[INFO]----------------------------------------------------------

Downloading:http://repo1.maven.org/maven2/org/apache/solr/solr/4.9.0/solr-

4.9.0.war

Downloaded:http://repo1.maven.org/maven2/org/apache/solr/solr/4.8.0/solr-

4.9.0.war(28585KBat432.5KB/sec)

...

[INFO]Jetty7.6.15.v20140411Embeddedstartedonport[8983]

ThismeansthatSolrisupandrunninganditislisteningonport8983.Nowopenyourwebbrowserandtypehttp://127.0.0.1:8983/solr.YoushouldseetheSolradministrationconsole.

TipIntheproject,andspecificallyinthesrc/dev/eclipsefolder,therearesomeuseful,ready-to-useEclipselaunchers.Insteadoffollowingthemanualstepsillustratedpreviously,justright-clickonthestart-embedded-solr.launchfileandgotoDebugAs|run-ch1-example-server.launch.

IntegrationtestserverAnotherimportantthingyoucould(orshould,inmyopinion)doinyourprojectistohaveanintegrationtestsuite.Integrationtestsareclassesthat,asthenamesuggests,runverificationsagainstarunningserver.

Whenyou’reworkingonaprojectwithSolrandyouwanttoimplementanextension,asearchcomponent,oraplugin,youwillobviouslywanttoensurethatitisworkingproperly.Ifyou’rerunninganexternalSolrserver,youneedtopackyourclassesinajar,copythatbundlesomewhere(later,wewillseewhere),starttheserver,andexecuteyourchecks.

Therearealotofdrawbackswiththisapproach.Eachtimeyougetsomethingwrong,youneedtorepeatthewholeprocess:fix,pack,copy,restarttheserver,prepareyourdata,andrunthecheckagain.Also,youcannoteasilydebugyourclasses(orSolrclasses)duringthatiterativecheck.Allofthiswillmostprobablyendwithalotofstatementsinyourcodeasfollows:

System.out.println("BLABLABLA");

IsupposeyouknowwhatI’mtalkingabout.

www.it-ebooks.info

Thisiswhereintegrationtestsbecomeveryhelpful.YoucancodeyourchecksandyourassertionsasnormalJavaclasses,andhaveanautomatedtestsuitethatdoesthefollowingeachtimeitisexecuted:

StartsanembeddedSolrinstanceExecutesyourtestsagainstthatinstanceStopstheSolrinstanceProducesusefulreports

Theprojectwesetuppreviouslyhasthatcapabilityalready,andthere’saverybasicintegrationtestinthesrc/test/javafoldertosimplyaddandquerysomedata.

Inordertoruntheintegrationtestsuite,createanewMavenrunconfiguration(right-clickontheprojectandgotoRunAs|Mavenbuild…),and,inthedialogbox,typecleaninstallintheGoalstextfield:

AfterclickingontheRunbutton,youshouldseesomethinglikethis:

...

[INFO]Jetty7.6.15.v20140411Embeddedstarting…

...

[INFO]ReadingSolrSchemafromschema.xml

...

[INFO]Jetty7.6.15.v20140411Embeddedstartedonport[8983]

...

-------------------------------------------------------

TESTS

www.it-ebooks.info

-------------------------------------------------------

Runningorg.gazzax.labs.solr.ase.ch1.it.FirstQueryITCase

...

Results:

Testsrun:1,Failures:0,Errors:0,Skipped:0

TipAsbefore,underthesrc/dev/eclipsefolder,thereisalreadyapreconfiguredEclipselauncherforthisscenario.Right-clickonthestart-embedded-solr.launchfileandgotoDebugAs|run-the-example-as-integration-test.

FromtheEclipselog,youcanseethatatest(specifically,anintegrationtest)hasbeensuccessfullyexecuted.Youcanfindthesourcecodeofthattestintheprojectwecheckedoutbefore.ThenameoftheclassthatisreportedinthelogisFirstQueryITCase(ITstandsforIntegrationTest),anditisintheorg.gazzax.labs.solr.ase.ch1.itpackage.

TheFirstQueryITCase.javaclassdemonstratesabasicinteractionflowwecanhavewithSolr:

//Thisisthe(input)DataTransferObjectbetweenyourclientandSOLR.

finalSolrInputDocumentinput=newSolrInputDocument();

//1.Populateswith(atleastrequired)fields

input.setField("id",1);

input.setField("title","ApacheSOLREssentials");

input.setField("author","AndreaGazzarini");

input.setField("isbn","972-2-5A619-12A-X");

//2.Addsthedocument

client.add(input);

//3.Commitchanges

client.commit();

//4.Buildsanewqueryobjectwitha"selectall"query.

finalSolrQueryquery=newSolrQuery("*:*");

//5.Executesthequery

finalQueryResponseresponse=client.query(query);

//6.Getsthe(output)DataTransferObject.

finalSolrDocumentoutput=response.getResults().iterator().next();

finalStringid=(String)output.getFieldValue("id");

finalStringtitle=(String)output.getFieldValue("title");

finalStringauthor=(String)output.getFieldValue("author");

finalStringisbn=(String)output.getFieldValue("isbn");

//7.1IncasewearerunningasaJavaapplicationprintoutthequery

results.

System.out.println("Itworks!Ifoundthefollowingbook:");

System.out.println("--------------------------------------");

System.out.println("ID:"+id);

System.out.println("Title:"+title);

www.it-ebooks.info

System.out.println("Author:"+author);

System.out.println("ISBN:"+isbn);

//7.OtherwiseassertsthequeryresultsusingstandardJUnitprocedures.

assertEquals("1",id);

assertEquals("ApacheSOLREssentials",title);

assertEquals("AndreaGazzarini",author);

assertEquals("972-2-5A619-12A-X",isbn);

TipFirstQueryITCaseisanintegrationtestandamainclassatthesametime.Thismeansthatyoucanrunitinthreeways:asdescribedearlier,asamainclass,andasaJUnittest.Ifyoupreferthesecondorthethirdoption,remembertostartSolrbefore(usingtherun-ch1-example-server.launch).Youcanfindthelaunchersunderthesrc/dev/eclipsefolder.Justright-clickononeofthemandruntheexampleinonewayoranother.

www.it-ebooks.info

www.it-ebooks.info

Whatdowehaveinstalled?Regardlessofthekindofinstallation,youshouldnowhaveaSolrinstanceupandrunning,soit’stimetohaveaquickoverviewofitsstructure.

SolrisastandardJEEwebapplication,packagedasa.wararchive.Ifyoudownloadedthebundlefromthewebsite,youcanfinditunderthewebappsfolderofJetty,usuallyunder:

$INSTALL_DIR/solr-x.y.z/example/webapps

Instead,ifyoufollowedthedeveloperway,Mavendownloadedthatwarfileforyou,anditisnowinyourlocalrepository(usuallyafoldercalled.m2underyourhomedirectory).

www.it-ebooks.info

SolrhomeInanycase,Solrhasbeeninstalledandyoudon’tneedtoconcernyourselfwithwhereitisphysicallylocated,mainlybecauseallthatyouhavetoprovidetoSolrmustresideinanexternalfolder,usuallyreferredtoastheSolrhome.

Inthedownloadbundle,there’sapreconfiguredSolrhomefolderthatcorrespondstothe$INSTALL_DIR/solr-x.y.z/example/solrfolder.WithinyourEclipseproject,youcanfindthatunderthesrcfolder;itiscalled(notsurprisingly)solr-home.

InaSolrhomefolder,youwilltypicallyfindafilecalledsolr.xml,andoneormorefoldersthatcorrespondtoyourSolrcores(wewillseewhatacoreis,inChapter2,IndexingYourData).Eachfolderhasasubfoldercalledconfwheretheconfigurationforthatspecificcoreresides.

www.it-ebooks.info

solr.xmlThefirstfileyouwillfindwithintheSolrhomedirectoryissolr.xml.Itdeclaressomeconfigurationparametersabouttheinstance.

Previously(inSolr4.4),youhadtodeclareallthecoresofyourinstanceinthisfile.Nowthere’samoreintelligentautodiscoverymechanismthathelpsyouavoidexplicitdeclarationsaboutthecoresthatarepartofyourconfiguration.

Inthedownloadbundle,youwillfindanexampleofaSolrhomewithonlyonecore:

$INSTALL_DIR/solr-x.y.z/example/solr

Thereisalsoanexamplewithtwocores:

$INSTALL_DIR/solr-x.y.z/example/multicore

Thisdirectoryisbuiltusingtheoldstylewementionedpreviously,withallthecoresexplicitlydeclared.IntheEclipseproject,youcanfindthesinglecoreexampleinadirectorycalledsolr-home.Themulticoreexampleisintheexample-solr-home-with-multicorefolder.

www.it-ebooks.info

schema.xmlAlthoughtheschema.xmlfilewillbedescribedindetaillater,itisimportanttobrieflymentionitbecausethisistheplacewhereyoucandeclarehowyourindex(ofaspecificcore)iscomposed,intermsoffields,types,andanalysis,bothatindextimeandquerytime.Inotherwords,thisistheschemaofyourindexand(mostprobably)thefirstthingyouhavetodesignaspartofyourSolrproject.

Inthedownloadbundleyoucanfindtheschema.xmlsampleunderthe$INSTALL_DIR/solr-x.y.z/example/solr/collection1/conffolder,whichishugeandfullofcomments.ItbasicallyillustratesallthepredefinedfieldsandtypesyoucanuseinSolr(youcancreateyourowntype,butthat’sdefinitelyanadvancedtopic).

Ifyouwanttoseesomethingsimplerfornow,theEclipseprojectunderthesolr-home/confdirectoryhasaverysimpleschema,withafewfieldsandonlyonefieldtype.

www.it-ebooks.info

solrconfig.xmlThesolrconfig.xmlfileiswheretheconfigurationofaSolrcoreisdefined.Itcancontainalotofdirectivesandsectionsbut,fortunatelyformostofthem,Solr’screatorshavesetdefaultvaluestobeautomaticallyappliedifyoudon’tdeclarethem.

NoteDefaultvaluesaregoodforalotofscenarios.WhenIwasinBarcelonaattheApacheLuceneEuroconin2011,thespeakeraskedduringapresentation,“Howmanyofyouhaveeverchangeddefaultvaluesinsolrconfig.xml?”Inalargeroom(200people),onlyfiveorsixguysraisedtheirhands.

Thisismostprobablythesecondfileyouwillhavetoconfigure.Oncetheschemahasbeendefined,youcanfine-tunetheindexchainandsearchbehaviorofyourSolrinstancehere.

www.it-ebooks.info

OtherresourcesSchemaandSolrconfigurationscanmakeuseofotherfilesforseveralpurposes.Thinkaboutstopwords,synonyms,orotherconfigurationfilesspecifictosomecomponent.ThosefilesareusuallyputintheconfdirectoryoftheSolrcore.

www.it-ebooks.info

www.it-ebooks.info

TroubleshootingIfyouhaveproblemsrelatedtowhatwedescribedpreviously,thefollowingtipsshouldhelpyougetthingsworking.

www.it-ebooks.info

UnsupportedClassVersionErrorYoucaninstallmorethanoneversionofJavaonyourmachinebut,whenrunningacommand(forexample,javaorjavac),thesystemwillpickupthejavainterpreter/compilerthatisdeclaredinyourpath.SoifyougettheUnsupportedClassVersionErrorerror,itmeansthatyou’reusingawrongJVM(mostprobablyJava6orolder).InthePrerequisitessectionearlierinthischapter,there’satablethatwillhelpyou.However,thisistheshortversion:Solr4.7.xallowsJava6or7,butSolr4.8orgreaterrunsonlywith(atleast)Java7.

Ifyou’restartingSolrfromthecommandline,justtypethis:

#java-version

TheoutputofthiscommandwillshowtheversionofJavayoursystemisactuallyusing.Somakesureyou’rerunningtherightJVM,andalsocheckyourJAVA_HOMEenvironmentvariable;itmustpointtotherightJVM.

Ifyou’rerunningSolrinEclipse,aftercheckingwhatisdescribedpreviously(thatis,theJVMthatstartsEclipse),makesureyou’reusingacorrectJVMbynavigatingtoWindow|Preferences|Java|InstalledJREs.

www.it-ebooks.info

The“Failedtoreadartifactdescriptor”messageWhenrunningacommandforthefirsttime(forexample,clean,install,ortest),ApacheMavenwillhavetodownloadalltherequiredlibraries.Inordertodothat,yoursystemmusthaveavalidInternetconnection.

Soifyougetthiskindofmessage,itmeansthatMavenwasn’tabletodownloadarequireddependency.Thenameofthedependencyshouldbeinthemessage.Thereasonforfailurecouldbeanetworkissue,eitherpermanentortransient.

Inthefirstcase,youshouldsimplycheckyourconnection.Inthesecondscenario(thatis,atransientnetworkfailureduringthedownload),therearesomemanualstepsthatneedtobedone.Assumethatthedependencyisorg.apache.solr:solr-solrj:jar:4.8.0.YoushouldgotoyourlocalMavenrepositoryandremovethecontentofthefolderthathoststhatdependency,likethis:

#rm-rf$HOME/.m2/repository/org/apache/solr/solr-solrj/4.8.0

Onthenextbuild,Mavenwilldownloadthatdependencyagain.

www.it-ebooks.info

www.it-ebooks.info

SummaryInthischapter,webeganourSolrtourwithaquickoverview,includingthestepsthatmustbeperformedwheninstallingSolr.Weillustratedtheinstallationprocessfrombothauser’sandadeveloper’sperspective.Regardlessofthepathyoufollowed,youshouldhaveaworkingSolrinstalledonyourmachine.

Inthenextchapter,wewillcontinueourconversationbydiggingfurtherintotheSolrindexingprocess.

www.it-ebooks.info

www.it-ebooks.info

Chapter2.IndexingYourDataAlthoughthefinalmotivebehindgettingaSolrinstanceistoenablefastandefficientsearches,weneedtopopulatethatinstancewithsomedatainthefirst(andmandatory)step.Thisoperationisusuallyreferredtoastheindexingphase.ThetermindexplaysanimportantroleintheSolrdomainbecauseitsunderlyingstructureisanindexitself.Thischapterfocusesontheindexingprocess.

Bytheendofthischapter,youwillbereasonablyconversantwithhowtheindexingprocessworksinSolr,howtoindexdata,andhowtoconfigureandcustomizetheprocess.

Thischapterwillcoverthefollowingtopics:

TheSolrdatamodel:invertedindex,document,fields,types,analyzers,andtokenizersIndexandindexingconfigurationTheSolrwritepathHowtoextendandcustomizetheindexingprocessTroubleshooting

www.it-ebooks.info

UnderstandingtheSolrdatamodelWheneverIstarttolearnsomethingthatisnotsimple,Istronglybelievethekeytocontrollingitscomplexityisagoodunderstandingofitsdomainmodel.ThissectiondescribestheunderlyingbuildingblocksofSolr.Itstartswiththesimplestpieceofinformation,thedocument,andthenwalksthoughtheotherfundamentalconcepts,describinghowtheyformtheSolrdatamodel.

www.it-ebooks.info

ThedocumentAdocumentrepresentsthebasicandatomicunitofinformationinSolr.Itisacontaineroffieldsandvaluesthatbelongtoagivenentityofyourdomainmodel(forexample,abook,car,orperson).

Ifyou’refamiliarwithrelationaldatabases,youcanthinkofadocumentasarecord.Thetwoconceptshavesomesimilarities:

Adocumentcouldhaveaprimarykey,whichisthelogicalidentityofdataitrepresents.Adocumenthasastructureconsistingofoneormoreattributes.Eachattributehasaname,type,andvalue.

However,aSolrdocumentdiffersinthefollowingwaysfromadatabaserecord:

Attributescanhavemorethanonevalue,whereasarowinadatabasetablecanhaveonlyonevalue(includingNULL).Attributeseitherhaveavalueordon’texistatall.There’snonotionofNULLvalueinSolr.Attributenamescanbestaticordynamic,buttablecolumnsinadatabasemustbeexplicitlydeclaredinadvance.Attributetypesare,ingeneral,morearticulatedandflexiblebecausetheymustdefinehowSolrinterpretsdatabothatindexandquerytime.Attributetypescanbedefinedandconfigured.Thiscanbedonebyusing,mixing,andconfiguringarichsetofbuilt-inclassesorcreatingnewtypes(thisisactuallyanadvancedscenario).

AsimplewaytorepresentaSolrdocumentisamap—ageneraldatastructurethatmapsuniquekeys(attributenames)tovalues,whereeachkey(thatis,attribute)canhaveoneormorevalues.ThefollowingJSONdatarepresentstwodocuments:

{

{

"id":27302038,

"title":"Abookaboutsomething",

"author":["Ashler,Frank","York,Lye"],

"subject":["Generalities","SocialSciences"],

"language":"English"

},

{

"id":2830002,

"title":"Anotherbookaboutsomething",

"author":"Ypsy,Lea",

"subject:"Geography&History",

"publisher":"Vignanello:Edikin,2010"

}

}

Althoughtheearlierdocumentsrepresentbooksandhavesomecommonattributesasyoucansee,thefirsthastwosubjectsandalanguage,whiletheseconddoesn’thavea

www.it-ebooks.info

publicationlanguage.Ithasonlyonesubjectandanadditionalpublisherattribute.

Fromadocument’sperspective,there’snoconstraintaboutwhichandhowmanyattributesadocumentcanhave.ThoseconstraintsareinsteaddeclaredwithintheSolrschema,whichwewillseelater.

TipThesrc/solr/example-datafolderoftheprojectassociatedwiththischaptercontainssomeexampledatawherethesamedocumentsarerepresentedinseveralformats.

www.it-ebooks.info

TheinvertedindexSolrusesanunderlying,persistentstructurecalledinvertedindex.Itisdesignedandoptimizedtoallowfastsearchesatretrievaltime.Togainthespeedbenefitsofsuchastructure,ithastobebuiltinadvance.

Aninvertedindexconsistsofanorderedlistofallthetermsthatappearinasetofdocuments.Besideeachterm,theindexincludesalistofthedocumentswherethattermappears.

Forexample,let’sconsiderthreedocuments:

{

{"id":1,"title":"TheBirthdayConcert"},

{"id":2,"title":"LiveinItaly"},

{"id":3,"title":"LiveinPaderborn"},

}

Thecorrespondinginvertedindexwouldbesomethinglikethis:

Terms DocumentIds

1 2 3

Birthday X

Concert X

Italy X

Live X X

Paderborn X

The X

In X X

Liketheindexofabook(here,Imeantheindexthatyouusuallyfindattheendofabook),ifyouwanttosearchdocumentsthatcontainagiventerm,aninvertedindexhelpyouwiththatefficientlyandquickly.

InSolr,indexfilesarehostedinaso-calledSolrdatadirectory.Thisdirectorycanbeconfiguredinsolrconfig.xml,themainconfigurationfile.

TipAfterrunninganyexampleintheprojectassociatedwiththisbook,youwillfindtheSolrindexunderthesubfolderslocatedintarget/solr.Thenameofthesubfolderactuallydependsonthenameofthecoreusedintheexample.

www.it-ebooks.info

TheSolrcoreTheindexconfigurationofagivenSolrinstanceresidesinaSolrcore,whichisacontainerforaspecificinvertedindex.Onthedisk,Solrcoresaredirectories,eachofthemwithsomeconfigurationfilesthatdefinefeaturesandcharacteristicsofthecore.

Inacoredirectory,youwilltypicallyfindthefollowingcontent:

Acore.propertiesfilethatdescribesthecore.Aconfdirectorythatcontainsconfigurationfiles:aschema.xmlfile,asolrconfig.xmlfile,andasetofadditionalfiles,dependingoncomponentsinuseforaspecificinstance(forexample,stopwords.txtandsynonyms.txt).Alibdirectory.EveryJARfileplacedinthisdirectoryisautomaticallyloadedandcanbeusedbythatspecificcore.

InaSolrinstallationyoucanhaveoneormorecores,eachofthemwithadifferentconfiguration,thatwillthereforeresultindifferentinvertedindexes.

NoteTheconceptoftheSolrcorehasbeenexpandedinSolr4,specificallyinSolrCloud.WewilldiscussthisinChapter6,DeploymentScenarios.

www.it-ebooks.info

TheSolrschemaReturningtothecomparisonwithdatabases,anotherimportantdifferenceisthat,inrelationaldatabases,dataisorganizedintables.Youcancreateoneormoretablesdependingonhowyouwanttoorganizethepersistenceoftheentitiesbelongingtoyourdomainmodel.

InSolr,thingsbehavedifferently.There’snonotionoftables;inaSolrschema,youmustdeclareattributes,aprimarykey,andasetofconstraintsandfeaturesoftheentityrepresentedbytheincomingdocuments.Althoughthisdoesn’tstrictlymeanyoumusthaveonlyoneentityinyourschema,let’sthinkinthiswayatthemoment(forsimplicity):aSolrschemaislikethedefinitionofasingletablethatdescribesthestructureandtheconstraintsoftheincomingdata(thatis,documents).

TheSolrschemaisdefinedinafilecalled(notsurprisingly)schema.xml.Itcontainsseveralconcepts,butthemostimportantarecertainlythoserelatedtotypesandfields.BeforeSolr4.8,typesandfieldsweredeclaredwithina<types>anda<fields>tag,respectively.Nowtheirdeclarationscanbemixed,whichallowsbettergroupingoffieldswiththeircorrespondingtypes.

TipYoucanfindasampleschemawithinthedownloadbundlewesetupinthepreviouschapter,specificallyunder$INSTALL_DIR/solr-x.y.z/example/solr/collection1/conf/schema.xml.Itishugeandcontainsalotofexamplesaboutpredefinedandbuilt-intypesandfields,withmanyusefulcomments.

FieldtypesFieldtypesareoneofthetop-levelentitiesdeclaredinSolrschemas.Afieldtypeisdeclaredusingthe<fieldType>element.Asyoucanseeintheexampleschema,youcanhaveasimpletype,suchasthis:

<fieldTypename="string"class="solr.StrField"sortMissingLast="true"/>

Youcanalsohavetypeswithalotofinformation,asshownhere:

<fieldTypename="text-general"class="solr.TextField"

positionIncrementGap="100">

<analyzertype="index">

<tokenizerclass="solr.StandardTokenizerFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>

<filterclass="solr.LowerCaseFilterFactory"/>

</analyzer>

<analyzertype="query">

<tokenizerclass="solr.StandardTokenizerFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>

<filterclass="solr.LowerCaseFilterFactory"/>

<filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"/>

</analyzer>

</fieldType>

www.it-ebooks.info

Alltypesshareasetofcommonattributesthataredescribedinthefollowingtable:

Attribute Description

name Thenameofthefieldtype.Thisisrequired.

typeThefullyqualifiednameoftheclassthatimplementsthefieldtypebehavior.Thisisrequired.

sortMissingFirst

sortMissingLast

Optionalattributesthatarevalidonlyforsortablefields.Theydefinethesortpositionofthedocumentsthathavenovaluesforagivenfield.

indexedIfthisistrue,fieldsassociatedwiththistypewillbesearchable,sortablesandfacetable.

storedIfthisistrue,fieldsassociatedwiththistypeareretrievable.Briefly,storedfieldsarewhatSolrreturnsinsearchresponses.

multiValued Ifthisistrue,fieldsassociatedwiththistypecanhavemultiplevalues.

omitNorms

NormsarevaluesconsistingofonebyteperfieldwhereSolrrecordsindextimeboostandlengthnormalizationdata.Indextimeboostallowsonefieldtobeboostedhigherthanother.Lengthnormalizationallowsshorterfieldstobeboostedmorethanlongerfields.Ifyoudon’tuseindextimeboostanddon’twanttouselengthnormalization,thenthisattributecanbesettotrue.

omitTermsAndFrequencyPositions

Tokensproducedbytextanalysisduringtheindexprocessarenotsimplytext.Theyalsohavemetadatasuchasoffsets,termfrequency,andoptionalpayloads.Ifthisattributeissettotrue,thenSolrwon’trecordtermfrequenciesandpositions.

omitPositions Omitsthepositionsinindexedtokens.

positionsIncrementGapWhenafieldhasmultiplevalues,thisattributespecifiesthedistancebetweeneachvalue.Thisisusedtopreventunwantedphrasematches.

autogeneratePhraseQueriesOnlyvalidfortextfields.Ifthisissettotrue,thenSolrwillautomaticallygeneratephrasequeriesforadjacentterms.

compressed Inordertodecreasetheindexsize,storedvaluesoffieldscanbecompressed.

compressThreshold Wheneverthefieldiscompressed,thisistheassociatedcompressionthreshold.

Besidesallofthis,eachspecifictypecandeclareitsownattributes,dependingonthecharacteristicofthetypeitself.

Thetextanalysisprocess

Beforetalkingaboutfields,whicharethetop-levelbuildingblocksoftheSolrschema,let’sintroduceafundamentalconcept—textanalysis.

Thetextanalysisprocessconvertsanincomingvalueintokensbymeansofadedicatedtransformationchainthatisinchargeofmanipulatingtheoriginalinputvalue.Eachresultingtokenisthenpostedtotheindexwiththefollowingmetadata:

Positionincrement:Thepositionofthetokenrelativetotheprevioustokeninthe

www.it-ebooks.info

inputstreamStartandendoffset:ThestartingandendingindexesofthetokenwithintheinputstreamPayload:Anoptionalbytearrayusedforseveralpurposes,suchasboosting

Atokenwithitsmetadataisusuallyreferredtoasaterm.

InSolr,textanalysishappensattwodifferentmoments:indexandsearchtime.Inthefirstcase,thevalueisthecontentofagivenfieldofagivendocumentthataclientsentforindexing.Inthesecondcase,theincomingvaluetypicallycontainssearchtermswithinaquery.

Inbothcases,youmusttellSolrhowtohandlethosevalues.Youcandothatintheschema,inthefieldtypessection.

Forfieldtypes,thefollowinggeneralrulesalwaysapply:

Ifthefieldtypeimplementationclassissolr.TextFieldoritextendssolr.TextField,thenSolrallowsyoutoconfigureoneortwoanalyzersectionsinordertocustomizetheindexand/orthequerytextanalysisprocessInothercases,noanalyzerscanbedefined,andtheconfigurationofthetypeisdoneusingtheavailableattributesofthetypeitself

Thisisanexampleofafieldtypedefinition:

<fieldTypename="text-general"class="solr.TextField"

positionIncrementGap="100">

<analyzertype="index">

</analyzer>

<analyzertype="query">

</analyzer>

</fieldType>

Here,youcanseetwodifferentanalyzersections.Inthefirstsection,youwilldeclarewhathappensatindextimeforagivenfieldassociatedwiththatfieldtype.Thesecondsectionhasthesamepurpose,butitisvalidforquerytime.

NoteIfyouhavethesameanalysisatindexandquerytimes,youcandefinejustone<analyzer>sectionwithnonameattribute.Thatwillbesupposedtobevalidforbothphases.

Withineachanalyzerdefinition,youdefinethetextanalysisprocessbymeansofcharacterfilters,tokenizers,andtokenfilters.

Charfilters

Charfiltersareoptionalcomponentsthatcanbesetatthebeginningoftheanalysischaininordertopreprocessfieldvalues.Theycanmanipulateacharacterstreambyadding,removing,orreplacingcharacterswhilepreservingtheoriginalcharacterposition.

www.it-ebooks.info

Inthefollowingexample,twocharfiltersareusedtoreplacediacritics(thatis,letterswithglyphssuchasà,ü)andremovesometext:

<analyzertype="index">

<charFilterclass="solr.MappingCharFilterFactory"mapping="mapping-

FoldToASCII.txt"/>

<charFilterclass="solr.PatternReplaceCharFilterFactory"pattern="\\

(Author\\)"replacement=""/>

</analizer>

NoteYoumustneverdeclaretheimplementationclass.Instead,declareitsfactory.

Usingtheprecedingchain,theMillöcker,Carltext(nameofauthor)willbecomeMillocker,Carl.

Acompletelistofavailablecharfilterscanbefoundathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories.

Tokenizers

Atokenizerbreaksanincomingcharacterstreamintooneormoretokensdependingonspecificcriteria.Theresultingsetoftokensisusuallyreferredtoasatokenstream.Ananalyzerchainallowsonlyonetokenizer.

Supposewehave“I’mwritingasimpletext”astheinputtext.Thefollowingtableshowshowtwosampletokenizerswork:

Tokenizer Description Tokens

WhitespaceTokenizer Splitsbywhitespaces “I’m”,“writing”,“a”,“simple”,“text”

KeywordTokenizer Doesn’tsplitatall “I’mwritingasimpletext”

Acompletelistofavailabletokenizerscanbefoundathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenizerFactories.

Tokenfilters

Tokenfiltersworkonaninputtokenstream,contributingsomekindoftransformationtoit.Analyzingtokenaftertoken,afiltercanapplyitslogicinordertoadd,remove,orreplacetokens,andcanthusproduceanewoutputtokenstream.

Tokenfilterscanbechainedtogetherinordertoproducecomplexanalysischains.Theorderinwhichthosefiltersaredeclaredisimportantbecausethechainitselfisnotcommutative.Twochainswiththesamefiltersinadifferentordercouldproduceadifferentoutputstream.

Thisisanextractofasamplefilterchain:

<filterclass="solr.LowerCaseFilterFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"

ignoreCase="true"/>

www.it-ebooks.info

Afilterdeclarationincludesthenameoftheimplementationfactoryclassandasetofattributesthatarespecifictoeachfilter.Intheprecedingchain,thisiswhathappensforeachtokenintheinputstream:

Thetokenismadeintolowercase,so“Happy”willbecome“happy”Ifthetokenisastopword,thatis,oneofthewordsdeclaredinafilecalledstopwords.txt,itgetsfilteredfromtheoutgoingstream

Acompletelistofavailabletokenfiltersisavailableathttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories.

Puttingitalltogether

Thefollowingcodeillustratesacompletefieldtypedefinition:

<fieldTypename="my-text-type"class="solr.TextField"

positionIncrementGap="100">

<analyzertype="index">

<charFilterclass="solr.MappingCharFilterFactory"mapping="mapping-

FoldToASCII.txt"/>

<tokenizerclass="solr.WhitespaceTokenizerFactory"/>

<filterclass="solr.StopFilterFactory"words="stopwords.txt"/>

<filterclass="solr.LowerCaseFilterFactory"/>

</analyzer>

</fieldType>

Inordertogetaconcreteviewofwhathappensduringtheindexphaseofagivenfield,openashellinthetop-leveldirectoryoftheprojectassociatedwiththischapter.Next,typethefollowingcommand:

#mvncargo:run–PfieldAnalysis

TipYoucandothesamewithEclipsebycreatinganewMavenDebuglaunchconfiguration.Onthelaunchdialog,youmustfilltheGoalsinputfieldwithcargo:runandtheProfileinputfieldwithfieldAnalysis.

ThatwillstartaSolrinstancewithanexampleschemathatcontainsseveraltypes.OnceSolrhasbeenstarted,openyourbrowserandtypehttp://127.0.0.1:8983/solr/#/analysis/analysis.Thepagethatappearsletsyousimulatetheindexphaseofagivenvalue(thecontentofthelefttextarea)foragivenfieldorfieldtype(thecontentofthedrop-downmenuatthebottomofthepage).

TypesometextintheFieldValue(Index)textarea,chooseafieldtypeorafield,andpresstheAnalyseValuesbutton.Thepagewillshowtheinputandtheoutputvaluesofeachmemberoftheindexchain.Thefollowingscreenshotillustratestheresultingpageafteranalyzingthe“ApacheSolr”textwitharight_truncated_phrasefieldtype:

www.it-ebooks.info

Someexamplefieldtypes

Thissectionlistsanddescribessomeimportantfieldtypesandtheirmainfeaturesinanon-exhaustiveway.Theschema.xmlfileinthedownloadbundlecontainsalotofexampleswithalltheavailabletypes.

Inaddition,alistofallfieldtypesisavailableathttps://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr.String

Thestringtyperetainstheincomingvalueasasingletoken.

NoteThatdoesn’tmeanthefieldcannotbeindexed.Itonlymeansthatthefieldcannothaveauser-definedanalysischain.

Thistypeisusuallyassociatedwiththefollowing:

Indexedfields:Fieldsthatrepresentcodes,classifications,andidentifiers,suchasA340,853.92,SKU#22383,3919928832,292381,anden-USSortfields:Fieldsthatcanbeusedassortcriteria,suchasauthors,titles,andpublicationdates

Numbers

ThereareseveralnumerictypesdefinedinSolr.Theycanbeclassifiedintothreegroups:

BasictypessuchasIntField,FloatField,andLongField.Thesearethelegacytypesthatencodenumericvaluesasstrings.SortablefieldstypessuchasSortableDoubleField,SortableIntField,andSortableLongField.Thesearethelegacytypesthatencodenumericvaluesasstringsinordertomatchtheirnaturalnumericorder(thisisdifferentfromthestring’slexicographicorder).TriefieldstypessuchasTrieIntField,TrieFloatField,andTrieLongField.These

www.it-ebooks.info

arethetypesthatindexnumericvaluesusingvariousandtunablelevelsofprecisioninordertoenableefficientrangequeriesandsorting.ThoselevelsareconfiguredusingaprecisionStepattributeinthefieldtypedefinition.

Thefirsttwogroups,basicandsortabletypes,aredeprecatedandwillsoonberemoved(mostprobablyinSolr5.0).ThisisbecausetheirfeaturesandcharacteristicsarealreadyincludedinTrietypes,whicharemoreefficientandprovideaunifiedwayofdealingwithnumbers.Boolean

Booleanfieldscanhaveavalueoftrueorfalse.Valuesof1,t,orTareinterpretedastrue.Date

TheformatthatSolrusesfordatesisarestrictedversionoftheISO8601DateandTimeformatandisoftheYYYY-MM-DDThh:mm:ss.SSSZform.Herearesomeexamplesofthisfieldtype:

2005-09-27T14:43:11Z

2011-08-23T02:43:00.992Z

TheZcharacterisaliteral,trailingconstantthatindicatestheUTCmethodofthedaterepresentation.Onlythemillisecondsareoptional.Iftheyaremissing,thedot(.)afterthesecondsmustberemoved.

Aswithnumbers,therearetwoavailabletypestorepresentdatesinSolr:

AbasicDateFieldtype,whichisadeprecatedlegacytypeTrieDateField,whichistherecommendeddatetype

Ausefulfeatureofdatetypesisasimpleexpressionlanguagethatcanbeusedtoformdynamicdateexpressions,likethis:

NOW+2YEARS

NOW+3YEARS–3DAYS

2005-09-27T14:43:00+1YEAR

Theexpressionlanguageallowsthefollowingkeywords:

Keyword Description

YEAR/YEARSOneormoreyears.Thesearebasicallysynonyms;thedifferenceisjusttomaketheexpressionsmorereadable(forexample,2YEARSisbetterthan2YEAR).

MONTH/MONTHS Oneormoremonths(forexample,NOW+4MONTHS,NOW–1MONTH).

DAY/DAYS/DATE Adayoracertainnumberofdays(forexample,NOW+1DAY).

HOUR/HOURS Anhouroracertainnumberofhours.

MINUTE/MINUTES Oneormoreminutes.

MILLI/MILLIS

www.it-ebooks.info

MILLISECOND

MILLISECONDS

Oneormoremilliseconds.

Text

Textisthebasictypeforfieldsthatcanhaveaconfigurabletextanalysis.Thisistheonlytypethatacceptsanalyzerchainsinconfigurations.Othertypes

Thefollowinglistbrieflydescribessomeotherinterestingtypes:

Currency:Thistypeprovidessupportformonetaryvalueswithadedicatedtype.Italsoincludesthecapabilitytopluginseveralprovidersfordeterminingexchangeratesbetweencurrencies.Binary:Thistypeisusedtohandlebinarydata.DataissentandretrievedinBase64-encodedstrings.Geospatialtypes:Twotypesareavailableforsupporttogeospatialsearches.ThefirstisLatLonType,fromSolr3.xonwards.Thesecondtype,SpatialRecursivePrefixTreeFieldType,isanewtypeintroducedinSolr4,anditsupportspolygonshapes.Random:Thisisusedtogeneraterandomsequences.Itisusefulifyouwantpseudorandomsortorderingofindexeddocuments.

FieldsFieldsarecontainersofvaluesassociatedwithaspecifictype.Theyrepresentthestructureandthecompositionoftheentityofyourdomainmodel.

Insimplewords,fieldsaretheattributesofthedocumentsyou’regoingtomanagewithSolr.So,forexample,ifSolrservesalibraryOnlinePublicApplicationCatalogue(OPAC),theentitiesintheschemawillmostprobablyrepresentbooks,andtheycouldhavefieldssuchastitle,author,ISBN,cover,andsoon.

Fieldsaredeclaredintheschema.Eachfielddeclarationincludesaname,type,andsetofattributes.Thisisanexampleoffielddeclaration:

<fieldname="title"type="string"indexed="false"stored="true"

required="true"multiValued="false"/>

Thefollowingtableliststheattributesthatcanbespecifiedforeachfield:

Keyword Description

name

Thenameofthefieldmustbeuniqueintheschemaandmustconsistonlyofalphanumericandunderscorecharacters.Itmustnotstartwithanunderscore,anditmustnothavebothaleadingandatrailingunderscorebecausethosekindsofnamesarereserved.

type Thisisthetypeassociatedwiththefield.

indexedIfthisistrue,fieldsassociatedwiththistypewillbesearchable,sortable,andfacetable.Itoverridesthesamesettingontheassociatedtype.

www.it-ebooks.info

storedIfthisistrue,itmakesthefieldsassociatedwiththistyperetrievable.Itoverridesthesamesettingontheassociatedtype.

required Thismarksthefieldasmandatoryininputdocuments.

defaultAdefaultvaluethatwillbeusedatindextime,ifthefieldintheinputdocumentdoesn’thaveavalidvalue.

sortMissingFirst

sortMissingLast

Theseareoptionalattributesdefiningthesortpositionofthedocumentsthathavenovaluesforthatfield.Theyoverridethesamesettingsontheassociatedtype.

omitNorms Omitsthenormsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.

omitPositionsOmitsthetermpositionsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.

omitTermFreqAndPositionsOmitsthetermfrequencyandpositionsassociatedwiththisfield.Overridesthesameattributeonthefieldtype.

termVectorsStoresthetermvectors.Atermvectorisalistofthedocument’stermsandtheirnumberofoccurrencesinthatdocument.

docValuesOnlyavailablefortheString,Trie,andUUIDfields.Thisattributeenhancestheindexbyaddingcolumn-orientedfieldstoadocument-to-valuemapping.

Staticfields

Thefirstcategoryoffieldscontainsthosestaticallydeclaredintheschema.Inthiscontext,staticsimplymeansthatthenameofthefieldisexplicitlyknowninadvance.Thisisanexampleofastaticfield:

<fieldname="isbn"(otherattributesfollow)/>

Dynamicfields

Therearecertainsituationswhereyoudon’tknowinadvancethenameofsomefieldsintheincomingdocuments.Althoughthismaysoundstrange,itisratherafrequentscenario.

Thinkaboutadocumentthatrepresentsabookandistheresultofsomekindofcataloguing.Ingeneral,abibliographicrecordhasalotoffields.Someofthemrepresenttextthatcanbeexpressedbycataloguersinseverallanguages.Forexample,youcanhaveabookwiththeseabstracts:

{

"id":92902893,

"abstract_en":"ThisistheEnglishsummary",

"abstract_es":"Ésteeselresumenenespañol",

(otherfieldsfollow)

}

Youcanhaveanotherbookwiththefollowingdefinition:

{

"id":92902893,

"abstract_it":"L'automazionedellabibliotecadigitale"

www.it-ebooks.info

(otherfieldsfollow)

}

Sothequestionhereis,howcanwedefinetheabstractfield(orfields)inourschema?Thefirstapproachcouldbetodeclareseveralstaticfields—oneforeachlanguage—butthiswillbevalidonlyifweknowalltheinputlanguagesinadvance.Moreover,thisisnotveryextensiblebecauseaddinganewlanguage(forexample,abstract_ru)willrequireachangeintheschema.Dynamicfieldsarethealternative.

Afieldisdynamicwhenitsnameincludesaleadingoratrailingwildcard,thereforeallowingadynamicmatchwithincominginputfields.Adynamicfieldisdeclaredusingthe<dynamicField>element,asfollows:

<dynamicFieldname="abstract_*"(otherattributesfollow)/>

Thefieldwillcatchallfieldsthathaveaprefixequaltoabstract.Hence,itavoidstheneedtostaticallydefinefieldsonebyone,butmostimportantly,itwillcatchanyabstractfieldregardlessofitslanguagesuffix.

Copyfields

IntheSolrschema,youcanuseaspecialcopyFielddirectivetocopyonefieldtoanother.Thisisusefulwhenadocumenthasagivenfield,andstartingfromitsvalue,youwanttohaveotherfieldsinyourschemapopulatedwiththesamevaluebutwithadifferenttextanalysis.

Let’ssupposeyourdocumentsrepresentbooksthatcancontaintwodifferentkindsofauthors:

persons(forexample,DanteAlighieriandLeonardoDaVinci)corporates(forexample,AssociationforChildhoodEducationInternational)

Youmustshowthoseauthorsseparatelyintheuserinterface,aspartofcustomerrequirements.Youcangivethemdedicatedlabels,forexample.Atthesametime,thecustomerwantstohaveanauthorsearchfeatureontheuserinterfacethattriggersasearchforallkindsofauthors.ThefollowingscreenshotshowsaGUIwidgetthatisoftenusedinthesescenarios—asearchtoolbarwithadrop-downmenuthatallowstheusertoconstrainthescopeofthesearchwithinagivencontext(forexample,authors,subjects,andtitles):

Afirstapproachcouldbetohavetwostoredandindexedfields.Whentheusersearchesforanauthorbytypinganameorasurname,suchtermswillbesearchedwithinthosetwofields.Theschemainthiscaseshouldbeasfollows:

www.it-ebooks.info

<fieldname="author_person"type="text"indexed="true"stored="true"…/>

<fieldname="author_corporate"type="text"indexed="true"stored="true"…/>

Asecondchoicecouldbetohaveamorecohesivedesignbyseparatingsearchandviewresponsibilities.Inthiscase,wewillhavetwostored(butnotindexed)fieldsrepresentingthetwokindsofauthors,andagenericindexed(butnotstored)author_searchfieldcontainingalltheauthorsofadocument,regardlessofitstype.Inthisway,theuserinterfacewillusethestoredfieldsforvisualization,whileSolrwillusethecatch-allauthor_searchfieldforsearches.ThisdesignintroducesthecopyFielddirective;hereisthecorrespondingschema:

<fieldname="author_person"type="string"indexed="false"stored="true"

required="false"multiValued="true"/>

<fieldname="author_corporate"type="string"indexed="false"stored="true"

required="false"multiValued="true"/>

<fieldname="author_search"type="text"indexed="true"stored="false"

required="false"multiValued="true"/>

<copyFieldsource="author_person"dest="author_search"/>

<copyFieldsource="author_corporate"dest="author_search"/>

ThecopyFielddirectivecopiestheincomingvalueofthesourcefieldinthedestfield;thus,attheend,theauthor_searchfieldwillcontainallkindsofauthors.

NoteInboththesourceanddestattributes,it’spossibletouseatrailingoraleadingwildcard,thereforeavoidingrepetitivecode.Intheprecedingexample,wecouldhavejustonecopyFielddeclaration:

<copyFieldsource="author_*"dest="author_search"/>

OtherschemasectionsOtherthanfieldsandfieldtypes,theSolrschemacontainssomeotherthingsaswell.Thissectionbrieflyillustratesthem.

Uniquekey

Thisfielduniquelyidentifiesyourdocument.Thisisnotstrictlyrequiredbutstronglyrecommendedifyouwanttoupdateyourdocuments,avoidduplicates,and(lastbutnotleast)useSolrdistributedfeatures.

Defaultsimilarity

ThiselementallowsyoutodeclarethefactoryoftheclassusedbySolrtodeterminethescoreofdocumentswhilesearching.

www.it-ebooks.info

www.it-ebooks.info

SolrindexingconfigurationOncetheschemahasbeendefined,it’stimetoconfigureandtunetheindexingprocessbymeansofanotherfilethatresidesinthesamedirectoryoftheschema—solrconfig.xml.

Thefilecontainsalotofsections,butfortunately,therearealotofoptionalpartswithdefaultvaluesthatusuallyworkwellinmostscenarios.Wewilltrytounderlinethemostimportantofthemwithrespecttothischapter.

Asageneralnote,it’spossibletousesystempropertiesanddefaultvalueswithinthisfile.Therefore,weareabletocreateadynamicexpression,likethis:

<dataDir>${my.data.dir:/var/data/defaultDataDir}</dataDir>

ThevalueofthedataDirelementwillbereplacedatruntimewiththevalueofthemy.data.dirsystemproperty,orwiththedefaultvalueof/var/data/defaultDataDirifthatpropertydoesn’texist.

www.it-ebooks.info

GeneralsettingsTheheadingpartofthesolrconfig.xmlfilecontainsgeneralsettingsthatarenotstrictlyrelatedtotheindexphase.

ThefirstistheLucenematchversion:

<luceneMatchVersion>LUCENE_47</luceneMatchVersion>

ThisallowsyoutocontrolwhichversionofLucenewillbeinternallyusedbySolr.ThisisusefultomanagemigrationphasestowardsthenewerversionsofSolr,thusallowingbackwardcompatibilitywithindexesbuiltwithpreviousversions.

Asecondpieceofinformationyoucansethereisthedatadirectory,thatis,thedirectorywhereSolrwillcreateandmanagetheindex.Itdefaultstoadirectorycalleddataunder$SOLR_HOME.

<dataDir>/var/data/defaultDataDir</dataDir>

www.it-ebooks.info

IndexconfigurationThesectionwithinthe<indexConfig>tagcontainsalotofthingsthatyoucanconfigureinordertofine-tunetheSolrindexphase.

Acuriousthingyoucanseeinthissection,inthesolrconfig.xmlfileoftheexamplecore,isthatmostthingsarecommented.Thisisveryimportant,becauseitmeansthatSolrprovidesgooddefaultvaluesforthosesettings.

Thefollowingtablesummarizesthesettingsyouwillfindwithinthe<indexConfig>section:

Attribute Description

writeLockTimeout ThemaximumallowedtimetowaitforawritelockonanIndexWriter.

maxIndexingThreadsThemaximumallowednumberofthreadsthatindexdocumentsinparallel.Oncethisthresholdhasbeenreached,incomingrequestswillwaituntilthere’sanavailableslot.

useCompoundFileIfthisissettotrue,Solrwilluseasinglecompoundfiletorepresenttheindex.Thedefaultvalueisfalse.

ramBufferSizeMBWhenaccumulateddocumentupdatesexceedthismemorythreshold,allpendingupdatesareflushed.

ramBufferSizeDocsThishasthesamebehaviorasthatofthepreviousattribute,butthethresholdisdefinedasthecountofdocumentupdates.

mergePolicy Thenamesoftheclass,alongwithsettings,thatdefinesandimplementsthemergestrategy.

mergeFactor

Athresholdindicatinghowmanysegmentsanindexisallowedtohavebeforetheyaremergedintoonesegment.Eachtimeanupdateismade,itisaddedtothemostrecentindexsegment.Whenthatsegmentfillsup(thatis,whenthemaxBufferedDocsandramBufferSizeMBthresholdsarereached),anewsegmentiscreatedandsubsequentupdatesareinsertedthere.Oncethenumberofsegmentsreachesthisthreshold,Solrwillmergeallofthemintoonesegment.

mergeScheduler Theclassthatisresponsibleforcontrollinghowmergesareexecuted.

lockType ThelocktypeusedbySolrtoindicatethatagivenindexisalreadyownedbyIndexWriter.

www.it-ebooks.info

UpdatehandlerandautocommitfeatureThe<UpdateHandlerSection>configuresthecomponentthatisresponsibleforhandlingrequeststoupdatetheindex.

Thisiswhereit’spossibletotellSolrtoperiodicallyrununsolicitedcommitssothatclientswon’tneedtodothatexplicitlywhileindexing.Declaringtwodifferentthresholdscantriggerauto-commits:

maxDocs:ThemaximumnumberofdocumentstoaddsincethelastcommitmaxTime:Themaximumamountoftime(inmilliseconds)topassforadocumentbeingaddedtoindex

Theyarenotexclusive,soit’sperfectlylegaltohavesettingssuchasthese:

<autoCommit>

<maxDocs>5000</maxDocs>

<maxTime>300000</maxTime>

</autoCommit>

StartingfromSolr4.0,therearetwokindsofcommit.Ahardcommitflushestheuncommitteddocumentstotheindex,thereforecreatingandchangingsegmentsanddatafilesonthedisk.Theothertypeiscalledsoftcommit,whichdoesn’tactuallywriteuncommittedchangesbutjustreopenstheinternalSolrsearcherinordertomakeuncommitteddatainthememoryavailableforsearches.

Hardcommitsareexpensive,butaftertheirexecution,dataispermanentlypartoftheindex.Softcommitsarefastbuttransient,soincaseofasystemcrash,changesarelost.

HardandsoftcommitscancoexistinaSolrconfiguration.Thefollowingisanexamplethatshowsthis:

<autoCommit>

<maxTime>900000</maxTime>

</autoCommit>

<autoSoftCommit>

<maxTime>1000</maxTime>

</autoSoftCommit>

Here,asoftcommitwillbetriggeredeverysecond(1000milliseconds),andahardcommitwillrunevery15minutes(900000milliseconds).

www.it-ebooks.info

RequestHandlerARequestHandlerinstanceisapluggablecomponentthathandlesincomingrequests.Itisconfiguredinsolrconfig.xmlasaspecificendpointbymeansofitsnameattribute.

RequestssenttoSolrcanbelongtoseveralcategories:search,update,administration,andstats.Inthiscontext,weareinterestedinthosehandlersthatareinchargeofhandlingindexupdaterequests.Althoughnotmandatory,thosehandlersareusuallyassociatedwithanamestartingwiththe/updateprefix,forexample,thedefaulthandleryouwillfindintheconfiguration:

<requestHandlername="/update"class="solr.UpdateRequestHandler"/>

PriortoSolr4,eachkindofinputformat(forexample,JSON,XML,andsoon)requiredadedicatedhandlertobeconfigured.Nowthegeneral-purposeupdatehandler,thatis,the/updatehandlerusesthecontenttypeoftheincomingrequestinordertodetecttheformatoftheinputdata.Thefollowingtableliststhebuilt-incontenttypes:

Mime-type Description

application/xml

text/xmlXMLmessages

application/json

text/jsonJSONmessages

application/csv

text/csvComma-separatedvalues

application/javabin Java-serializedobjects(Javaclientsonly)

Eachformathasitsownwayofencodingthekindofupdateoperation(forexample,add,delete,andcommit)andtheinputdocuments.ThisisasampleaddcommandinXML:

<add>

<doc>

<fieldname="id">12020</field>

<fieldname="title">Roundaroundmidnight</field>

</doc>

</add>

Later,wewillindexsomedatausingdifferenttechniquesanddifferentformats.

www.it-ebooks.info

UpdateRequestProcessorThewritepathoftheindexprocesshasbeenconceivedbySolrdeveloperswithmodularityandextensibilityinmind.Specifically,theindexprocesshasbeenstructuredasachainofresponsibilities,whereeachsetofcomponentsaddsitsowncontributiontothewholeindexprocess.

TheUpdateRequestProcessorchainisanimportantconfigurableaspectoftheindexprocess.Ifyouwanttodeclareyourcustomchain,youneedtoaddacorrespondingsectionwithintheconfiguration.Thisisanexampleofacustomchain:

<updateRequestProcessorChainname="my-index-chain">

<processorclass="…"/>

<processorclass="…">

<strname="aParameterName">aParameterValue</str>

</processor>

<processorname="solr.RunUpdateProcessorFactory"/>

<processorname="solr.LogUpdateProcessorFactory"/>

</updateRequestProcessorChain>

DefininganewchainrequiresanameandasetofUpdateRequestProcessorFactorycomponentsthatareinchargeofcreatingprocessorinstancesforthatchain.

NoteActually,thedefinitionofthechainisnotenough.Itmustbeenabled,(thatis,associatedwithRequestHandler)inthefollowingway:

<requestHandlername="/myReqHandler"

class="solr.UpdateRequestHandler">

<lstname="defaults">

<strname="update.chain">chain.name</str>

</lst>

</requestHandler>

TherearealotofalreadyimplementedUpdateRequestProcessorcomponentsthatyoucanuseinyourchain,butingeneral,it’sabsolutelyeasytocreateyourownprocessorandcustomizetheindexchain.

TipTheexampleprojectwiththischaptercontainsseveralexamplesofUpdateRequestProcessorwithintheorg.gazzax.labs.solr.ase.ch2.urppackage.

www.it-ebooks.info

www.it-ebooks.info

IndexoperationsThissectionshowsyouthebasiccommandsneededforupdatinganindex,byaddingorremovingdocuments.Asageneralnote,eachcommandwewillseecanbeissuedinatleasttwoways:usingthecommandline,throughthecURLtool,forexample(abuilt-intoolinalotofLinuxdistributionsandavailableforallplatforms);andusingcode(thatis,SolrJorsomeotherclientAPI).Whenyouwanttoadddocuments,it’salsopossibletorunthosecommandsfromtheadministrationconsole.

NoteSolrJandclientAPIswillbecoveredlaterinadedicatedchapter.

AnothercommonaspectoftheseinteractionsistheSolrresponse,whichalwayscontainsastatusandaQTimeattribute.Thestatusisareturnedcodeoftheexecutedcommand,whichisalways0iftheoperationsucceeds.TheQTimeattributeistheelapsedtimeoftheexecution.ThisisanexampleoftheresponseinXMLformat:

<response>

<lstname="responseHeader">

<intname="status">0</int>

<intname="QTime">97</int>

</lst>

</response>

www.it-ebooks.info

AddThecommandsendsoneormoredocumentstoaddtoSolr.Thedocumentsthatareaddedarenotvisibleuntilacommitoranoptimizecommandisissued.

WealreadysawthatdocumentsaretheunitofinformationinSolr.Here,dependingontheformatofthedata,oneormoredocumentsaresentusingtheproperrepresentation.

Sincetheattributesandthecontentofthemessagewillbethesameregardlessoftheformat,theformaldescriptionofthemessagestructurewillbegivenonce.ThefollowingisanaddcommandinXMLformat:

<addcommitWithin="10000"overwrite="true">

<docboost="1.9">

<fieldname="id">12020</field>

<fieldname="title"boost="2.2">Roundaroundmidnight</field>

<fieldname="subject">Music</field>

<fieldname="subject">Jazz</field>

</doc>

</add>

Let’sdiscusstheprecedingcommandindetail:

<add>:ThisistheroottagoftheXMLdocumentandindicatestheoperation.commitWithin:Thisisanalternativetotheautocommitfeatureswesawpreviously.Usingthisoptionalattribute,therequestorasksSolrtoensurethatthedocumentswillbecommittedwithinagivenperiodoftime.overwrite:ThistellsSolrtocheckoutandeventuallyoverwritedocumentswiththesameuniqueKey.Ifyoudon’thaveauniqueKey,oryou’reconfidentthatyouwon’teveraddthesamedocumenttwice,youcangetsomeindexperformanceimprovementsbyexplicitlysettingthisflagtofalse.<doc>:Thisrepresentthedocumenttobeadded.boost:Thisisanoptionalattributethatspecifiestheboostforthewholedocument(thatis,foreachfield).Itdefaultsto1.0.<field>:Thisisafieldofthedocumentwithjustonevalue.Ifthefieldismultivalued,therewillbeseveralfieldswiththesamenameanddifferentvalues.boost:Thisisanoptionalattributethatspecifiestheboostforthespecificfield.Itdefaultsto1.0.

ThesamedatacanbeexpressedinJSONasfollows:

{

"add":{

"commitWithin":10000,

"overwrite":true,

"doc":{

"boost":1.9,

"id":12020,

"title":{

"value":"Roundaroundmidnight",

"boost":2.2

www.it-ebooks.info

},

"subject":["Music","Jazz"]

}

}

}

Asyoucansee,theinformationisthesameasinthepreviousexample.ThedifferenceisintheencodingoftheinformationaccordingtotheJSONformat.

SendingaddcommandsWecanissueanaddcommandinseveralways:usingcURL,theadministrationconsole,andaclientAPIsuchasSolrJ.

ThecURLtoolisacommand-linetoolusedtotransferdatawithURLsyntax.Amongotherprotocols,itsupportsHTTPandHTTPS,soit’sperfectforsendingcommandstoSolr.ThesearesomeexamplesofaddcommandssentusingcURL:

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary@datafile.xml

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary

'<addcommitWithin="10000"overwrite="true">

<docboost="1.9">

<fieldname="id">12020</field>

<fieldname="subject">Jazz</field>

</doc>

</add>'

Thefirstexampleusesdatacontainedinafile.Thesecond(usefulforshortrequests)directlyembedsthedocumentsinthedata-binaryparameter.TheprecedingexamplesareperfectlyvalidforJSONandCSVdocumentsaswell(obviously,thedataformatandthecontenttypewillchange).

www.it-ebooks.info

DeleteAdeletecommandwillmarkoneormoredocumentsasdeleted.Thismeansthetargetdocumentsarenotimmediatelyremovedfromtheindex.Instead,akindoftombstoneisplacedonthem;whenthenextcommiteventhappens,thatdatawillberemoved.Commitsandoptimizesarecommandsthatmaketheupdatechangesvisibleandavailable.Inotherwords,theymakethosechangeseffectivelypartoftheSolrindex.Wewillseebothofthemlater.

Solrallowsustoidentifythetargetdocumentsintwodifferentways:byspecifyingasetofidentifiersorbydeletingalldocumentsmatchedbyaquery.Inthesamewayaswesentaddcommands,wecanusecURLtoissuedeletecommands:

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary@datafile_with_deletes.xml

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary

'<delete>

<id>92392</id>

<query>publisher:"Ashler"</query>

</delete>'

Inthesecondexample,weissuedacommandtodelete:

Thedocumentwith92392asuniqueKeyAlldocumentsthathaveapublisherattributewiththeAshlervalue

www.it-ebooks.info

Commit,optimize,androllbackChangesresultingfromaddanddeleteoperationsarenotimmediatelyvisible.Theymustbecommittedfirst;thatis,acommitcommandhastobesent.

WealreadyexploredhardandsoftunsolicitedcommitsintheIndexconfigurationsection.ThesamecommandcanbeexplicitlysenttoSolrbyclients.

Althoughwepreviouslydescribedthedifferencebetweenhardandsoftcommits,it’simportanttorememberthatahardcommitisanexpensiveoperation,causingchangestobepermanentlyflushedtodisk.Softcommitsoperateexclusivelyinmemory,andarethereforeveryfastbuttransient;so,intheeventofaJVMcrash,softlycommitteddataislost.

TipInaprototypeI’mworkingon,weindexdatacomingfromtrafficsensorsinSolr.Asyoucanimagine,theinputflowiscontinuous;itcanhappenseveraltimesinasecond.Acontrolsystemneedstoexecuteagivensetofqueriesatshortperiodicintervals,forexample,everyfewseconds.Inordertomakethemostupdateddataavailabletothatsystem,weissueasoftcommiteverysecondandahardcommitevery20minutes.Atthemoment,thisseemstobeagoodcompromisebetweentheavailabilityoffreshdataandtheriskofdataloss(itcouldstillhappenduringthose20minutes).

Forthoseinterested,theSolrextensionwewilluseinthatprojectisavailableonGitHub,athttps://github.com/agazzarini/SolRDF.ItallowsSolrtoindexRDFdata,anditisagoodexampleofthecapabilitiesofSolrintherealmofcustomization.

Athirdkindofcommit,whichisactuallyahardcommit,istheso-calledoptimize.Withoptimize,otherthanproducingthesameresultsasthoseofahardcommit,Solrwillmergethecurrentindexsegmentsintoasinglesegment,resultinginasetofintensiveI/Ooperations.Themergeusuallyoccursinthebackgroundandiscontrolledbyparameterssuchasmergescheduler,mergepolicy,andmergefactor.Likethehardcommit,optimizeisaveryexpensiveoperationintermsofI/Obecause,apartfromcostingthesameasahardcommit,itmusthavesometemporaryspaceavailableonthedisktoperformthemerge.

Itispossibletosendthecommitortheoptimizecommandtogetherwiththedatatobeindexed:

#curlhttp://127.0.0.1:8983/solr/update?commit=true-H"Content-type:

text/xml"--data-binary@datafile.xml

#curlhttp://127.0.0.1:8983/solr/update?optimize=true-H"Content-type:

text/xml"--data-binary@datafile.xml

Themessagepayloadcanalsobeacommitcommand:

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary'<commit/>'

AcommithasafewadditionalBooleanparametersthatcanbespecifiedtocustomizethe

www.it-ebooks.info

servicebehavior:

Parameter Description

waitSearcher Thecommandwon’treturnuntilanewsearcherisopenedandregisteredasthemainsearcher

waitFlush Thecommandwon’treturnuntiluncommittedchangesareflushedtodisk

softCommit Ifthisistrue,asoftcommitwillbeexecuted

Beforecommittinganypendingchange,it’spossibletoissuearollbacktoremoveuncommittedaddanddeleteoperations.Thefollowingareexamplesofrollbackrequests:

#curlhttp://127.0.0.1:8983/solr/update?rollback=true

#curlhttp://127.0.0.1:8983/solr/update-H"Content-type:text/xml"--

data-binary'<rollback/>'

www.it-ebooks.info

www.it-ebooks.info

ExtendingandcustomizingtheindexprocessAswesawbefore,theSolrindexchainishighlycustomizableatdifferentpoints.Thissectionwillgiveyousomehintsandexamplestocreateyourownextensioninordertocustomizetheindexingphase.

www.it-ebooks.info

ChangingthestoredvalueoffieldsOneofthemostfrequentneedsthatIencounterwhileI’mindexingbibliographicdataistocorrectorchangetheheadings(labels)belongingtotheincomingrecords(documents).

NoteThishasnothingtodowiththetextanalysiswehavepreviouslyseen.Here,wearedealingwithunwanted(wrong)values,diacriticsthatneedtobereplaced,oringeneral,labelsintheoriginalrecordthatwewanttochangeandshowtotheendusers.InSolrterms,wewanttochangethestoredvalueofafieldbeforeitgetsindexed.

SupposealibraryhasalotofrecordsandwantstopublishtheminanOPAC.Unfortunately,manyofthoserecordshavetitleswithatrailingunderscore,whichhasaspecialmeaningforlibrarians.Whilethisisnotaproblemforthecataloguingsoftware(becauselibrariansareawareofthatconvention),itisnotacceptabletoendusers,anditwillsurelybeseenasatypo.Soifwehaverecordswithtitlessuchas“Agoodoldstory_”or“Thisisanothertitle_”inourapplication,wewanttoshow“Agoodoldstory”and“Thisisanothertitle”withoutunderscoreswhentheusersearchesforthoserecords.

Rememberthatanalyzersandtokenizersdeclaredinyourschemaonlyactontheindexedvalueofagivenfield.Thestoredvalueiscopiedverbatimasitarrives,sothere’snochancetomodifyitonceitisindexed.

Inthesecases,anUpdateRequestProcessorperfectlyfitsourneeds.TheexampleprojectassociatedwiththischaptercontainsseveralexamplesofcustomUpdateRequestProcessors.Here,weareinterestedinRemoveTrailingUnderscoreProcessor,whichcanbefoundinthesrc/main/javawithintheorg.gazzax.labs.solr.ase.chr.urppackage.

Asyoucansee,writinganUpdateRequestProcessorrequirestwoclassestobeimplemented:

Factory:Aclassthatextendsorg.apache.solr.update.processor.UpdateRequestProcessorFactory

Processor:Aclassthatextendsorg.apache.solr.update.processor.UpdateRequestProcessor

Thefirstisafactorythatcreatesconcreteinstancesofyourprocessorandcanbeconfiguredwithasetofcustomparametersinsolrconfig.xml:

<processorclass="org.gazzax.labs.solr.ase.chr.urp.

RemoveTrailingUnderscoreProcessorFactory">

<arrname="fields">

<strname="fields">title</str>

<strname="fields">author</str>

</arr>

</processor>

Inthiscase,insteadofhardcodingthenameofthefieldsthatwewanttocheck,wedefineanarrayparametercalledfields.Thatparameterisretrievedinthefactory,specificallyin

www.it-ebooks.info

theinit()method,whichwillbecalledbySolrwhenthefactoryisinstantiated:

privateString[]fields;

@Override

publicvoidinit(NamedListargs){

SolrParamsparameters=SolrParams.toSolrParams(args);

this.fields=parameters.getParams("fields");

}

TheotherrelevantsectionofthefactoryisinthegetInstancemethod,whereanewinstanceoftheprocessoriscreated:

@Override

publicvoidgetInstance(SolrQueryRequestreq,SolrQueryReponseres,

UpdateRequestProcessornext){

returnnewRemoveTrailingUpdateRequestProcessor(next,fields);

}

Anewprocessorinstanceiscreatedwiththenextprocessorinthechainandthelistoftargetfieldsweconfigured.Nowtheprocessorreceivesthoseparametersandcanadditscontributiontotheindexphase.Inthiscase,wewanttoputsomelogicbeforetheaddphase:

@Override

publicvoidprocessAdd(finalAddUpdateCommandcommand){

//1.RetrievetheSolr(Input)Document

SolrInputDocumentdocument=command.getSolrInputDocument();

//2.Loopthorughtargetfields

for(Stringname:fields){

//3.Getthefieldvalue

//weassumetargetfieldsaremonovaluedforsimplicity

Stringvalue=document.getFieldValue(name);

//4.Checkandeventuallychangethevalue

if(value!=null&&value.endsWith("_")){

StringnewValue=value.substring(0,value.length()-1);

document.setFieldValue(name,newValue);

}

}

//5.IMPORTANT:forwardtothenextprocessorinthechain

super.processAdd(command);

}

TipYoucanfindthesourcecodeofthewholeexampleundertheorg.gazzax.labs.solr.ase.ch2.urppackageofthesourcefolderintheprojectassociatedwiththischapter.ThepackagecontainsadditionalexamplesofUpdateRequestProcessor.

www.it-ebooks.info

IndexingcustomdataThedefaultUpdateRequestHandlerisverypowerfulbecauseitcoversthemostpopularformatsofdata.However,therearesomecaseswheredataisavailableinalegacyformat.Hence,weneedtodosomethinginordertohaveSolrworkingwiththat.

Inthisexample,Iwilluseaflatfile,thatis,asimpletextfilethattypicallydescribesrecordswithfieldsofdatadefinedbyfixedpositions.TheyareverypopularinintegrationprojectsbetweenbanksandERPsystems(justtogiveyouaconcretecontext).

TipIntheexampleprojectassociatedwiththischapter,youcanfindanexampleofsuchafiledescribingbooksunderthesrc/solr/solr-homes/flatIndexer/example-input-datafolder.

Here,eachlinehasafixedlengthof107charactersandrepresentsabook,withthefollowingformat:

Parameter Position

Id 0to8

ISBN 8to22

Title 22to67

Author 67to106

Therearetwoapproachesinthisscenario:thefirstmovestheresponsibilityontheclientside,thuscreatingacustomindexerclientthatgetsthedatainanyformatandcarriesoutsomemanipulationtoconvertitintooneofthesupportedformats.Wewon’tcoverthisscenariorightnow,aswewilldiscussclientAPIsinanextchapter.

AnotherapproachcouldbeacustomextensionoftheUpdateRequestHandler.Inthiscase,wewanttohaveanewcontenttype(text/plain)andacorrespondingcustomhandlertoloadthatkindofdata.Therearetwothingsweneedtoimplement.ThefirstisasubclassoftheexistingUpdateRequestHandler:

publicclassFlatDataUpdateextendsUpdateRequestHandler{

@Override

protectedMap<String,ContentStreamLoader>createDefaultLoaders(NamedList

n){

Map<String,ContentStreamLoader>registry=newHashMap<String,

ContentStreamLoader>();

registry.put("text/plain",newFlatDataLoader());

returnregistry;

}

}

Here,wearesimplyoverridingthecontenttyperegistry(theregistryinthesuperclasscannotbemodified)toaddourcontenttype,withacorrespondinghandlercalled

www.it-ebooks.info

FlatDataLoader.ThisclassextendsContentStreamLoaderandimplementstheparsinglogicoftheflatdata:

publicclassFlatDataLoaderextendsContentStreamLoader

Thecustomloadermustprovideaload(…)methodtoimplementthestreamparsinglogic:

@Override

publicvoidload(

SolrQueryRequestreq,

SolrQueryResponsersp,

ContentStreamstream,

UpdateRequestProcessorprocessor)throwsException{

//1.getareaderassociatedwiththecontentstreamBufferedReader

reader=null;

try{

reader=newBufferedReader(stream.getReader());

StringactLine=null;

while((actLine=reader.readLine())!=null){

//2.Sanitycheck:checklinelength

if(actLine.length()!=107){

continue;

}

//3.parseandcreatethedocument

SolrInputDocumentdoc=newSolrInputDocument();

doc.setField("id",actLine.substring(0,8));

doc.setField("isbn",actLine.substring(8,22));

doc.setField("title",actLine.substring(22,67));

doc.setField("author",actLine.substring(67));

AddUpdateCommandcommand=getAddCommand(req);

command.solrDoc=document;

processor.processAdd(command);

}finally{

//Closethereader

}

}

Ifyouwanttoviewthisexample,justopenthecommandlineinthefolderoftheprojectassociatedwiththischapter,andrunthefollowingcommand:

#mvncargo:run–PflatIndexer

TipYoucandothesamewithEclipsebycreatinganewMavenlaunchaspreviouslydescribed.Inthatcase,youwillalsobeabletoputdebugbreakpointsinthesourcecode(yoursourcecodeandtheSolrsourcecode)andproceedstepbystepintheSolrindexprocess.

OnceSolrhasstarted,openanothershell,changethedirectorytogototheprojectfolder,andrunthefollowingcommand:

www.it-ebooks.info

#curlhttp://127.0.0.1:8983/solr/flatIndexer/update?commit=true-H

"Content-type:text/plain"--data-binary@src/solr/solr-

homes/flatIndexer/example-input-data/books.flat

Youshouldseesomethinglikethisintheconsole:

[UpdateHandler]start

commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=f

alse,softCommit=false,prepareCommit=false}

[SolrCore]SolrDeletionPolicy.onCommit:commits:num=2

[SolrCore]newestcommitgeneration=4

[SolrIndexSearcher]OpeningSearcher@77ee04bb[flatIndexer]main

[UpdateHandler]end_commit_flush

Nowopentheadministrationconsoleathttp://127.0.0.1:8983/solr/#/flatIndexer/query,andclickontheExecuteQuerybutton.Youshouldseethreedocumentsontherightpane.

TipYoucanfindthesourcecodeoftheentireexampleundertheorg.gazzax.labs.solr.ase.ch2.handlerpackageofthesourcefolderintheprojectassociatedwiththischapter.

www.it-ebooks.info

www.it-ebooks.info

TroubleshootingThissectionprovidessuggestionsandtipsonhowtoresolvesomecommonproblemsencounteredwhendealingwithindexingoperations.

www.it-ebooks.info

MultivaluedfieldsandthecopyFielddirectiveThecardinalityofafieldcanbetricky,especiallywhenusedinconjunctionwithcopyFielddirectives,wheretwoormoresingle-valuedfieldsarecopiedtoanotherfield,likethis:

<fieldname="author_person"…required="true"/>

<fieldname="author_corporate"…required="true"/>

<fieldname="author_search"…multiValued="true"/>

<copyFieldsource="author_person"dest="author_search"/>

<copyFieldsource="author_corporate"dest="author_search"/>

Inthiscase,thedestinationfieldmustbemultivalued.Otherwise,therewillbetwovaluesfortwodifferentsourcefields,andSolrwillrefusetoindexthewholedocument,showingERRORmultiplevaluesencounteredfornonmultiValuedfieldauthor_search.

www.it-ebooks.info

ThecopyFieldinputvalueAcommonmisunderstandingwiththecopyFielddirectiveisrelatedtothevaluethatisbeingcopiedfromthesourcetothedestfield.SupposeyoudefinefieldA,fieldB,andacopyFielddirectivefromAtoB:

<fieldname="A"type="text_without_stopwords"…/>

<fieldname="B"type="light_stemmed_text"…/>

<copyFieldsource="A"dest="B"/>

IrrespectiveofthetextanalysiswedefinedforfieldAandfieldB.FieldBwillgetthestoredvalueoffieldA,withoutanytextanalysisapplied.Inotherwords,theincomingvalueforthefieldAiscopiedverbatimtofieldBbeforeanyanalysistextcanbeassociatedwiththatfield.

So,ifwehaveavalueof“oneandtwo”forfieldA,“and”isconsideredasastopword.The“oneandtwo”valueisinjectedintofieldA,whichwilltriggerthetextanalysisforthetext_without_stopwordstype,thereforeresultinginanindexedvalue(forfieldA)composedoftwotokens:“one”,“two”(“and”hasbeenremoved).

Next,thevalueoriginalvalueoffieldA(“oneandtwo”)iscopiedtofieldB,triggeringthetextanalysisassociatedwiththatfield.

www.it-ebooks.info

RequiredfieldsandthecopyFielddirectiveArequiredattributeonastaticfielddenotesthatanincomingdocumentmustcontainavalidvalueforthatfield.IfafieldisthetargetordestinationofacopyFielddirectivetherequiredattributemeansthatinsomeway,thereshouldbeavalueforthatfieldcomingfromitssources.Seethefollowingexample:

<fieldname="A"…required="false"/>

<fieldname="B"…required="false"/>

<fieldname="C"…required="true"multiValued="true"/>

<copyFieldsrc="A"dest="C"/>

<copyFieldsrc="B"dest="C"/>

FieldsAandBarenotrequiredandtheyarecopiedinfieldC.SincethefieldCismandatory,youhavetomakesurethat,foreachinputdocument,atleastAorBwillhaveavalidvalue,otherwiseSolrwillcomplainaboutamissingvalueforfieldC.

www.it-ebooks.info

Storedtextisimmutable!AstoredfieldvalueisthetextthatcomesfromtheSolr(Input)document.Itwillbecopiedverbatimbecauseitarriveswithoutanychanges.Anytextanalysisconfiguredintheschemaforagivenfieldtypewon’taffectthatvalue.

Inotherwords,thestoredvaluewon’tbechangedatallbySolrduringtheindexphase.

www.it-ebooks.info

DatanotindexedThedesignofUpdateRequestProcessorfollowsthedecoratorpattern,consistingofanestedchainofresponsibilitywhereeachringisexecutedoneaftertheother.YourcustomUpdateRequestProcessorwillgetareferencetothenextprocessorinthechainduringitslifecycle.Onceitsworkhasbeendone,itiscrucialtoforwardtheexecutionflowtothenextprocessor.Otherwise,thechainwillbeinterruptedandnodatawillbeindexed.

www.it-ebooks.info

www.it-ebooks.info

SummaryInthischapter,wesawthemainconceptsoftheindexingphaseinSolr.Beinganinverted-index-basedsearchengine,Solrstronglyreliesontheindexingphasebyallowingacustomizableandtunableindexchain.

TheSolrwritepathisachainofresponsibilityconsistingofseveralactors,eachofthemwithapreciseroleintheoverallprocess.Whileyoumustknow,configure,andcontrolthosecomponentsasauser,youmustalsobeawareoftheirhighlevelofextensibility(asadeveloper).ThisallowsyoutoadaptandeventuallycustomizeaSolrinstanceaccordingtoyourspecificneeds.

WeaddressedtheconceptsthatformtheSolrdatamodel,suchasdocuments,core,schema,fields,andtypes.Wealsolookedattheindexingconfigurationandtheinvolvedcomponentssuchasupdaterequestprocessors,updatechains,andrequesthandlers.Wefinallydescribedhowtoconfigurethesecomponentsandwriteextensionsontopofthem.

Thepurposeoftheindexingphaseandtheindexitselfistooptimizespeedandperformanceinfindingrelevantdocumentsduringsearches.Hence,thewholeprocessisnotusefulwithoutthesearchphase,whichisthesubjectofthenextchapter.

www.it-ebooks.info

www.it-ebooks.info

Chapter3.SearchingYourDataOncedatahasbeenproperlyindexed,it’sdefinitelytimetosearch!Theindexingphasemakesnosenseifthingsendthere.Dataisindexedmainlytospeedupandfacilitatesearches.

ThischapterfocusesonsearchcapabilitiesofferedbySolrandillustratestheseveralcomponentsthatcontributetoitsreadpath.

Thechapterwillcoverthefollowingtopics:

QueryingSearchconfigurationTheSolrreadpath:queryparsers,searchcomponents,requesthandlers,andresponsewritersExtendingSolrTroubleshooting

www.it-ebooks.info

ThesampleprojectThroughoutthischapter,wewilluseasampleSolrinstancewithaconfigurationthatincludesallthetopicswewillgraduallydescribe.Thisinstancewillhaveasetofsimpledocumentsrepresentingmusicalbums.Thesearethefirstthreedocuments:

<doc>

<fieldname="id">1</field>

<fieldname="title">AModernJazzSymposiumofMusicandPoetry</field>

<fieldname="composer">CharlesMingus</field>

</doc>

<doc>

<fieldname="id">2</field>

<fieldname="title">WhereJazzmeetsPoetry</field>

<fieldname="artist">RaphaelAustin</field>

</doc>

<doc>

<fieldname="id">3</field>

<fieldname="title">I'mInTheMoodForLove</field>

<fieldname="composer">CharlieParker</field>

<fieldname="genre">Jazz</field>

</doc>

ThesourcecodeofthesampleprojectassociatedwiththischaptercontainstheentireMavenproject,whichcanbeeitherloadedinEclipseorusedviathecommandline.Asapreliminarystep,openashell(orrunthefollowingcommandwithinEclipse)intheprojectfolderandtypethis:

#mvncleancargo:run–Pquerying

TheprecedingcommandwillstartanewSolrinstance,withsampledatapreloaded.

TipThesampledataisautomaticallyloadedatstartupbymeansofacustomSolrEventListener.Youcanfindthesourcecodeundertheorg.gazzax.labs.solr.ase.ch3.listenerpackage.

Youcanusethepagelocatedathttp://127.0.0.1:8983/solr/#/example/querytotryandexperimentbyyourselftheseveralthingswewilldiscuss.

TipIfyouloadedtheprojectinEclipse,under/src/dev/eclipseyouwillfindthelaunchconfigurationusedtostartSolr.

www.it-ebooks.info

www.it-ebooks.info

QueryingSolrcanbeseenasatell-and-asksystem;thatis,youfirstputin(index)somedata,thenitcananswerquestionsyouask(query)aboutthatdata.Sincetheactorsinvolvedintheseinteractionsarenothumans,Solrprovidesaformalandsystematicwaytoexecutebothindexandqueryoperations.Specifically,fromaqueryperspective,thatrequiresaspecializedlanguagethatcanbeinterpretedbySolrinordertoproducetheexpectedanswers.Suchalanguageisusuallycalledaquerylanguage.

www.it-ebooks.info

Search-relatedconfigurationThesolrconfig.xmlfilehasa<query>sectionthatcontainsseveralsearchsettings.Mostofthemarerelatedtocaches,acriticaltopicthatwillbedescribedinChapter5,AdministeringandTuningSolr.

Aswealreadysaidfortheindexsection,allthoseparametershavegooddefaultsthatworkwellinalotofscenarios.Thislistdescribestherelevantsettings(cachesettingsarenotincluded):

Searcherlifecyclelisteners:Wheneverasearcherisopened,it’spossibletoconfigureoneormorequeriesthatwillbeautomaticallyexecutedinordertoprepopulatecaches.Usecoldsearcher:Ifasearchisissuedandthereisn’taregisteredsearcher,thecurrentwarmingsearcherisimmediatelyused.Ifthisattributeissettofalse,theincomingrequestwillwaituntilthewarmingcompletes.Maxwarmingsearchers:Thisisthemaximumnumberofsearchersthatarewarminginparallel.Theexampleconfigurationcontainsavalueof2,whichisgoodforpuresearcherinstances.Forindexers(whichcouldbealsosearchers),ahighervaluecouldbeneeded.

www.it-ebooks.info

QueryanalyzersInthepreviouschapter,wediscussedanalyzers.Theirmeaninghereisthesame,andthedifferenceresidesonlyintheirinputvalue.Whenweindexdata,thatvalueisthecontentofthefieldsthatmakeuptheinputdocuments.Atquerytime,theanalyzerprocessesavalue,term,orphrasecomingfromaqueryparserandrepresentingacompoundingpieceoftheuser-enteredquery.

TipInthepreviouschapter,weusedtheanalysispagetoseehowtextanalysisworksatindextime.Thatverypagehasanadditionalsectionthatcanbeusedtoseethesameprocessbutusingthequeryanalyzer.

www.it-ebooks.info

CommonqueryparametersAquerytoSolr,otherthanasearchstring,includesseveralparametersthatarepassedusingstandardHTTPprocedures,thatis,name/valuepairsinthequerystring,likethis:http://127.0.0.1:8080/solr/ch3/search?q=history&start=10&rows=10&sort=title

asc

Whilesomeofthemstrictlydependonthecomponentthatwillbeinchargeofhandlingtherequest,therearesetsofcommonparameters.Thefollowingtabledescribesthem:

Parameter Description

q ThesearchstringthatindicateswhatweareaskingtoSolraccordingtoagivensyntax.

start Thestartoffsetwithinsearchresults.Thisisusedtopaginatesearchresults.

rows Themaximumsize(thatis,numberofdocuments)ofthereturnedpage.

sortAcomma-separatedlistof(indexed)fieldsthatwillbeusedtosortsearchresults.Eachfieldmustbefollowedbythekeywordasc(forascendingorder)ordesc(descendingorder).

defTypeIndicatesthequeryparserthatwillinterpretthespecificsearchstring.Eachqueryparserhasdifferentfeaturesanddifferentrulesandacceptsadifferentsyntaxinqueries.

fl Acomma-orspace-separatedlistoffieldsthatwillbereturnedaspartofthematcheddocuments.

fq Afilterquery.Theparametercanberepeated.

wt Theresponseoutputwriterthatwilldeterminetheresponseoutputformat.

debugQueryIfthisistrue,anadditionalsectionwillbeappendedtotheresponsewithanexplanationofthecurrentreadpath.

explainOther

Theuniquekeyofadocumentthatisnotpartofsearchresultsforagivenquery.Solrwilladdasectiontotheresponseexplainingwhythedocumentassociatedwiththatidentifierhasbeenexcludedfromsearchresults.

timeAllowedAconstraintonthemaximumamountoftimeallowedforqueryexecution.Ifthetimeoutexpires,Solrwillreturnonlypartialresults.

cache Enablesordisablesquerycaching.

omitHeader

Bydefault,theresponsecontainsaninformationheaderthatcontainssomemetadataaboutthequeryexecution(forexample,inputparametersorqueryexecutiontime).Ifthisparameterissettotrue,thentheheaderisomittedintheresponse.

Thefollowingaresomeexamplesqueries:http://localhost:8983/solr/example/query?

q=charles&fq=genre:jazz&rows=5&omitHeader=tue&debugQuery=true

http://localhost:8983/solr/example/query?

q=charles&rows=10&omitHeader=tue&debugQuery=true&explainOther=2

http://localhost:8983/solr/example/query?q=*:*&start=5&rows=5

www.it-ebooks.info

Asyoucanimagine,theqparameter,whichcontainsthequery,willbeveryimportantinthischapter.Besidesthis,therearetwootherparameters—fl(fieldlist)andfq(filterqueries)—thatwillbedescribedinthenextsections,becausetheyhavesomeinterestingaspects.

FieldlistsTheflparameterindicateswhichfields(amongfieldsthathavebeenmarkedasstored)willbereturnedindocumentswithinaqueryresponse.Thinkofthesetwoscenarios:

Aschemathatcontainsalotoffields,probablydefiningmultipleentities(thatis,booksandauthors).I’mlookingforbookssoIdon’twanttoseeanyauthorattributes(andviceversa).Aschemathatcontainsstoredfieldswithalotoftext,usedforthehighlightingcomponent,forexample(itrequiresthathighlightsnippetscomefromastoredfield).WhenIexecutequeriesIdon’twantthosefieldstobereturnedaspartofthematchingdocuments.Inotherwords:Iwanttoexcludethosefieldsfromsearchresults.

Theflparameterspecifiesthelistoffieldsthatwillcompoundeachmatcheddocument,thusfilteringoutunwantedattributes.Theparameteracceptsaspace-orcomma-separatedlistofvalues,whereeachvaluecanbeanyofthefollowing:

Afieldname(forexample,title,artist,released,andsoon).Theliteralscore,whichisavirtualfieldindicatingthecomputedscoreforeachdocument.Aglob,whichisanexpressionthatdynamicallymatchesoneormorefieldsbymeansofthe*and?wildcardcharacters(forexample,art*,r?leas?d,andre?leas*).Theasterisk(*)character,whichmatchesallavailable(thatis,stored)fields.Afunctionthat,whenevaluated,willproduceavalueforavirtualfieldthatwillbeaddedtodocuments.Atransformer.Likeafunction,thisisanotherwaytocreatevirtualfieldsindocuments,withadditionaldatasuchastheLucenedocumentID,shardidentifier,orthequeryexecutionexplanation.

Explicitfields,score,functions,andtransformerscanbealiasedbyprefixingthemwithanamethatwillbeusedinplaceoftherealnameofthatmember.

TipSOLR-3191trackstheactivityrelatedtoaso-calledfieldexclusionfeature.Oncethispatchhasbeenapplied,itwillbepossibletoexplicitlyindicatewhichfieldsmustnotbepartofthereturneddocuments.

Thefollowingtablelistssomeexamplesoftheflparameter:

Example Description

*,score Allstoredfieldsandthescorevirtualfield

www.it-ebooks.info

t*,*d Allfieldsstartingwithtandendingwithd

max(old_price,new_price) Maximumvaluebetweenold_priceandnew_price

max_price:max(p1,p2) Afunctionalias

title,t_alias:title,[docid] Title,aliasedtitle,andatransformer

Thedifferencebetweenthethirdandfourthexamplesintheprecedingtableisinthenameofthefieldthatwillholdthefunctionvalue.Inthefirstcase,itwillbethefunctionitself;intheother,itwillbeavirtualfieldcalledmax_price.

TipWiththesampleinstancerunning,youcantrytheseexamplesbyissuingarequestsuchashttp://127.0.0.1:8983/solr/example/query?q=id:1&fl=,replacingthevalueoftheflparameter.

Acompletelistofavailablefunctionscanbeaccessedathttp://wiki.apache.org/solr/FunctionQuery#Available_Functions.

Acompletelistofavailabletransformerscanbereadathttps://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents.

FilterqueriesFilterqueriesoperateakindofintersectionontopofdocuments,resultingfromtheexecutionofthemainquery.Afilterqueryislikehavingarequiredconditioninyourmainquery(thatis,anadditionalclauseconcatenatedwiththeANDoperator),butwithsomeimportantdifferences:

ItisexecutedseparatelyandbeforethemainqueryThefilterandtheintersectionareappliedontopofthemainqueryresultsItdoesn’tinfluencethescoreofthedocuments,whichiscomputedintheexecutionofthemainqueryTheresultsoffilterqueriesarecachedseparatelysothattheycanbereusedforfurtherexecutions

Therecanbemorethanonefqparameterinasearchquery.Inthiscase,theresultoftheoverallexecutionwilltakeintoaccountallfilterclauses,thereforeresultingindocumentsthatsatisfytheintersectionbetweenthemainresultsandtheresultsofeachfilterquery.

FilterquerycachingisoneofthemostcrucialfeaturesofSolr.Afilterquery’sdesignshouldreflecttheaccesspatternofrequestorsasmuchaspossible.Considerthisfilterquery:

fq=genre:JazzANDreleased:1981

Theprecedingquerywillcachetheresultsofthosetwoclausestogether.So,ifyourapplicationprovidestwoseparatefilters(fortheendusers),genreandreleased,thefollowingfilterquerieswon’tbenefitfromthiscache,andtheywillbecached(again)separately:

www.it-ebooks.info

fq=genre:Jazz

fq=released:1981

Inthissituation,thefirstqueryshouldberewritteninthefollowingway,allowingreuseofthecacheassociatedwitheachfilterquery:

fq=genre:Jazz&fq=released:1981

www.it-ebooks.info

www.it-ebooks.info

QueryparsersAqueryparserisacomponentresponsiblefortranslatingasearchstringorexpressionintospecificinstructionsforSolr.Everyqueryparserunderstandsagivensyntaxforexpressingqueries.

Solrcomeswithseveralqueryparsers,givingtherequestorsawiderangeofwaysofaskingwhattheyneed.

www.it-ebooks.info

TheSolrqueryparserTheSolrqueryparser,oftenmistakenlycalledLucenequeryparser,isimplementedinorg.apache.solr.search.LuceneQParserPlugin.Itisratheraschema-drivensupersetofthedefaultLucenequeryparser.

NoteNotethePluginsuffixoftheclassname.Solrprovidesanextensibleframeworkforcreatingandplugginginyourownqueryparser.

Thefollowingsectionswilldescribetherelevantaspectsofthisparser.

Terms,fields,andoperatorsYou’vealreadymetterms.Theyareatomicunitsofinformationresultingfromananalysisappliedtogiventext.Atindextime,thattextisthevalueofafieldbelongingtoagiven(input)document.Atquerytime,termscomefromtheuser-enteredquerystring.Specifically,aquerystringisbrokenintoterms,fields,andoperators.

Termscanbesimpleorcompoundterms;forexample,theycanbesinglewordssuchasCM,Standard,and1959orphrasessuchas“GoodbyePorkPieHat.”Phrasesaretwoormorewordssurroundedbydoublequotes.

Fieldsarewhatwedeclaredintheschema.xmlfile.Theirusewithinasearchstringallowsarequestortoexpressinstructionssuchas“searchxinfieldy”wherexisatermoraphraseandyisthefieldname.Herearesomeexamplesoftheuseoffields:

title:"WhereJazzmeetsPoetry"

composer:Mingus

Operatorsarekeywordsorsymbolsusedasconjunctionsbetweenseveralfield-valuecriteriainordertocreatecomplexexpressions,suchasthis:

title:JazzORcomposer:CharlieANDreleased:1959

genre:JazzANDNOTreleased:1959

Thefollowingtabledescribestheavailableoperators:

Operator Description

AND Aconjunctionbetweentwocriteria,bothofwhichmustbesatisfied

OR Aconjunctionbetweentwocriteriawhereatleastonemustbesatisfied

+ Marksatermasrequired

-/NOT Marksatermasprohibited

It’salsopossibletouseapairofparenthesestogroupseveralfieldsorvaluescriteria,likethis:

(released:1957ANDcomposer:Mingus)OR(released:1976ANDNOTgenre:Jazz)

www.it-ebooks.info

ORreleased:(1988OR1959)

BoostsBoostingallowsyoutocontroltherelevanceofagivenmatchingdocument,thusofferingawaytogivetosomequeryresultsmoreimportancethanothers;forexample,ifyouaremainlyinterestedinJazzandlessinFusionalbums,youcouldusethis:

+genre:Fusion+genre:Jazz^2

Theboostfactorisinsertedafterafieldvaluecriterionandprefixedwithacaretsymbol.Ithastobegreaterthan0,andsinceitisafactor,avaluebetween0and1representsanegativeboost.Ifitisabsent,adefaultboostfactorof1willbeapplied.

WildcardsThewildcardcharacters,*and?,canbeusedwithinterms,withzeroormoreoccurrences.Theycannotbeappliedtocompoundterms(thatis,searchphrases)ornumericanddatetypes.The?wildcardmatchesasinglecharacter,whilethe*matcheszeroormoresequentialcharacters.Herearesomeexamplesofwildcards:

(title:moder*ANDartist:Min*)ORartist:(Yngw?eANDM?lm*)

FuzzyThetildesymbol(~)attheendofatermenablesaso-calledfuzzyquery,allowingyoutomatchtermsthataresimilartothatterm.FuzzylogicisbasedontheDamerau-Levenshteindistancealgorithm.Afterthetilde,youcanputavaluebetween0and2,indicatingtherequiredsimilarity(2meanshighsimilarityisrequired).Thedefaultvaluethatisusediftheparameterisnotgivenis0.5.

WiththeexampleSolrinstancerunning,openthequerypageintheadminconsoleandtypethefollowingquery:

artist:Charles~0.7

Thequeryresponsewillcontaintworesults.ThefirstisanalbumofCharlesMingus,thatisaperfectmatchwiththesearchtermentered.ThesecondartistisCharlieParker,whosenameissimilarbutnotequaltoCharles.

ProximityThesamesymbolthatisusedforafuzzyqueryhasadifferentmeaningwhenusedinconjunctionwithphrasequeries.Nowrunthefollowingquery:

title:"JazzPoetry"

Youwon’tgetanyresultbecausethere’snorecordwiththosetwoconsecutivetermsinthetitle.Usingatildefollowedbyanumber,whichexpressesadistancebetweenterms,youcanenableaproximitysearch,allowingmatchesofdocumentsthathavethosetwotermswithinaspecificdistancefromoneanother.

ThisquerywillmatchthedocumentthathasWhereJazzmeetsPoetryasitstitle:

www.it-ebooks.info

title:"JazzPoetry"~2

ThefollowingquerywillalsomatchthedocumentthathasAModernJazzSymposiumofMusicandPoetryasthetitle:

title:"JazzPoetry"~4

RangesRangesearchesallowustospecifyforagivenfieldasetofmatchingvaluesthatfallbetweenalowerandahigherbound,inclusiveorexclusiveofthosebounds.Herearesomeexamplesofranges:

released:[1957TO1988]

released:[1957TO*]

released:[*TO1988]

released:{1957TO1988}

released:[1957TO1988}

genre:[JazzTONewAge]

Youcanseethatthelowerandhigherboundscanbeliteralvalues,asshowninthefirstexample,wherewearesearchingforalbumsreleasedbetween1957and1988.Theboundscanalsobewildcards,asshowninthesecondandthirdexamples.Squareandcurlybracketsareusedtodenoteanincludedoranexcludedbound,respectively.So,inthefirstexample,both1957and1988areincluded;inthefourthexampletheyareexcluded.

Keepinmindthat,fornon-numericfields(asshowninthefifthexampleintheprecedingcodesnippet)sortingisdonelexicographically.Therefore,asequencesuchas1,02,14,100willresultin02,1,100,14usingthelexicographicorder,whichisverydifferentfromanumericsort.

www.it-ebooks.info

TheDisjunctionMaximumqueryparserTheSolrqueryparserispowerfulwhenitcomestobuildingcomplexexpressions.However,thosearequitefarfromwhattheuserusuallytypesinasearchfield.

ThinkabouttheGooglesearchpage.Whatdoyoutypeinthesearchtextfield?Notanexpression,butjustone,two,ormoretermsassociatedwithwhatyou’relookingfor.

TheDisjunctionMax(DisMax)queryparserdirectlyprocessesthoseuser-enteredtermsandsearchesforeachofthemacrossasetofconfigurabletargetfields,withaconfigurableweightforeachfield.

NoteTheDisMaxparserisenabledbysettingthedefTypeparametertodismax.

TheexampleSolrinstancehasarequesthandlerlisteningto/glike1thatusestheDisMaxparser.

Otherthansearchterms,thisqueryparsersupportssomefeaturesoftheSolrqueryparser,suchasquotes,thatcanbeusedtoindicatephrases,andthe+and-operandstomarkmandatoryandprohibitedterms,respectively.AllothertermmodifierswesawfortheSolrqueryparserareescaped,sotheywillbeinterpretedassearchterms.

Thenameoftheparsercomesfromitsbehavior:

Dis:Thisstandsfordisjunction,whichmeansthat,foreachwordinthequerystring,theparserbuildsanewsubqueryacrossfieldsandboostsspecifiedintheqfparameter.Theresultingqueriesaresubjectedtothefirst(required)constraintdefinedwiththemmparameter,andasetofoptionalclausesdefinedwithotherparameters,whichwewillseelater.Max:Thismeansmaximum,anditpertainstothescoringcomputation.TheDisMaxparserscoresagivendocumentbygettingthemaximumscorevalueamongallmatchingsubqueries.

Thefollowingsectionsdescribetheseveralparametersthattheparseraccepts.

QueryFieldsTheqfparameterindicatesasetoftargetfieldswiththeircorresponding(optional)boosts.Fieldsareseparatedbyspaces,andeachofthemcanhaveanoptionalboostassociatedwithit,henceresultinginexpressionssuchasthis:

qf=title^3.5artists^2.0genre^1.5released

Here,wewanttosearchacrossfourfields,eachofthemwithadifferentimportance,whichwillaffectthescoreassignedtoeachmatchingdocument.Theqfparameterisoneofthemainplaceswherewedefineoursearchstrategy,dependingoncustomerrequirements.

Tip

www.it-ebooks.info

InOPACs,there’sanever-endingdebateaboutwhichisthemorerelevantattributeamongtitlesandsubjects.Atitle,asyoucanimagine,isimportant,butcouldn’tcontaintermsthatarerepresentativesofawork.Asubjectisakindofcontrolledclassificationassignedbyaprofessionaluser(thatis,alibrarian).Asasearchserviceprovider,youcanusetheqfparametertoconfigureboosts,dependingoncustomerneeds,andavoidenteringthatdebate!

TheDisMaxqueryparserhasanotherinterestingfeaturewhensearchingfieldsdeclaredintheqfparameter:whenthosefieldsarenumericordates,inappropriatetermsaredropped.Returningtotheqfexpression,considersearchingforthis:

Mingus1962

Forthetitle,artistandgenrefields,Solrwillbuildtwoqueries.Butforthereleasedfield,itwillcreatejustonequeryusingthe1962word,thusresultinginatotalof7queries:

title:Mingus^3.5,artist:Mingus^2.0,genre:Mingus^1.5,title:1962^3.5,

artist:1962^2.5,genre:1962^1.5,released:1962

Asyoucansee,thereleased:Mingusqueryhasbeendroppedbecausereleasedisanumericfield.

AlternativequeryTheq.altoptionalparameterdefinesaquerythatwillbeusedintheabsenceofthemainquery.

Theq.altqueryisparsedbydefaultusingtheSolrqueryparser,soitacceptsthesyntaxwedescribedinthepreviousparagraph.UsingLocalParams,youcanchangetheq.altparser.

MinimumshouldmatchEverywordorphrasethatisapartofthesearchstring,unlessitisconstrainedbythe+or-operators(andtherefore,markedasrequiredorprohibited),isconsideredasoptional.Forthoseoptionalparts,themmparameterdefinestheminimumnumberofmatchesthatsatisfythequeryexecution.Theinterestingpointhereisthatotherthanacceptingaquantityoranumber,thisparameteralsoallowscomplexexpressions.Thefollowingtableillustratessomeexamplesofmm:

Value Description

Aninteger(forexample,3) Atleastthegivennumberofoptionalclausesmustmatch.

Apercentage(forexample,66%) Atleastthegivenpercentageofoptionalclausesmustmatch.

Anegativenumberoranegativepercentage

Thenumberofoptionalclausesthatmustmatchistheresultofsubtractingthegivenvaluefromthetotalnumberofoptionalclauses(absoluteor100percentdependingontheparametervalue).

www.it-ebooks.info

OneormoreexpressionswiththeX<|>Yformat

IftherearelessthanXoptionalclauses,theymustmatch.IfclausesaregreaterthanX,thenYmustbeusedasthemmvalue.Ycanbeapositiveornegativeintegerorapercentagevalue.Itisalsopossibletoconcatenateseveralexpressions,likethis:

3<75%6<-1

Thismeansthat,withthreeoptionalclauses,allofthemarerequired.Between4and6optionalclauses,werequireamatchof75percent.Finally,formorethansixclauses,werequireamatchofallclausesbutone.

Theseveralsubqueriesresultingfromsearchtermsparsingareconstrainedwiththemmparameter(specifically,anadditionalBooleanqueryactingasaconstraintisconcatenatedwiththeANDoperator),somatchingdocumentsthatdon’tsatisfythemmconstraintwon’tbepartofthesearchresults.

PhrasefieldsOncethelistofmatchingdocumentshasbeenpopulatedaccordingtothesearchcriteriaandconstraints(forexample,mmorfilterqueries),thepfparameterraisesthescoreofdocumentsthathavesearchtermsinproximity.

Astheqfparameter,pfcandeclarealistoffieldswithanoptionalboostfactor.

QueryphraseslopTheqsparameterindicatesaproximityfactortobeusedinthosephrasequeriesthatareeventuallyincludedinthesearchstring.

PhraseslopThepsparameterindicatesaproximityfactortobeusedinphrasequeriesbuiltforpffields.Notethatsuchquerieswillbeexecutedonlytoboostresults(seetheprevioussection),sothisparameterdoesn’taffectmatchingbutonlyboosting.

BoostqueriesThebqparameterdefinesaqueryparsedbytheSolrqueryparserthatwilladditionallyboostsearchresults.Itcanberepeated,thusallowingoneormorequeries.

If,forexample,youwanttogivemoreimportancetoitemswithapricethatfallswithinagivenrange,youcanuseaboostquerylikethis:

price:[10.00TO19]

AdditiveboostfunctionsThebfparameterdefinesafunctionthatwilladditionallyboostsearchresultsbyaddingitsvaluetothecomputedscore.Aswiththebqparameter,itcanberepeatedinordertohavemultiplefunctions.

TiebreakerThetieparameterisafloatnumber.Ithasavaluebetween0and1,anditaffectsthestrategyusedbytheparsertodeterminethefinalscoreofagiven(matching)document.

www.it-ebooks.info

TheDisjunctionMaxparser,assaidbefore,executesasetofsubqueriesontopofthefieldsdeclaredintheqfparameter.Thesubquerythathasthemaximumscoredeterminesthescoreofthedocument.Soschematically:

documentScore=scoreofmatchingsubquerywithhighestscore

However,youcouldendupwithtwodocumentsgettingthesamescore,becausethemaximumvaluecomputedbyeachwinnersubqueryisthesame.

Thetieparameterletsyoutakefine-grainedcontrolofthefinalscoreassignedtoeachdocument,byincludingthescoreofallmatchingsubqueriesinthecomputation.Thoseadditionalscoresaremultipliedbyafactor,thetievalue.So,theprecedingformulabecomesthefollowing:

documentScore=(scoreofmatchingsubquerywithhighestscore)+((tie)

*(scoresofothermatchingsubqueries))

Withavalueof0.0,wewillhaveapuredisjunctionmaxquery,whereonlythemaximumscoreisincluded.Avalueof1.0willleadtoadisjunctionsumquery,wherethefinalscoreisthesumofthescoresofallmatchingsubqueries.

www.it-ebooks.info

TheExtendedDisjunctionMaximumqueryparserThisparser(eDisMax)isbuiltontopoftheDisMaxparserandhassomeadditionalfeaturessuchasfieldedsearch,Booleanoperators,termmodifiers,andbetterhandlingofmistakesinqueries.

NoteTheeDisMaxparsercanbeenabledbysettingthedefTypeparametertoedismax.

TheexampleSolrinstancehasarequesthandlerlisteningto/glike2thatusestheeDisMaxparser.

Thefollowingsectionsdescribeadditionalparametersthatthisparseraccepts.AllparametersdescribedintheDisMaxparsersectionareincluded.

FieldedsearchTheeDisMaxparsersupportsthefullsyntaxoftheSolrqueryparser,thereforeallowingaso-calledfieldedsearch(thatis,title:Jazz)withBooleanoperatorsandtermmodifiers(forexample,fuzzyandproximity).

Inaddition,thisparsersupportsfieldaliasingandrenaming.Thisallowsyoutogiveaninteractionviewtotherequestor(forexample,anenduser,aqueryclient,andsoon)thatispartiallyorcompletelydecoupledfromSolr’sunderlyingdatamodel.

Aliasingisdoneusingthefollowingsyntax:

f.<alias>.qf=(oneormorerealfieldswithoptionalboosts)

Here,<alias>isthevirtualnamethatwillbeassociatedwiththefield(orfields)declaredontherightoperand.Asyoucansee,analiascanbeappliedtosinglefieldsortoagroupoffields.Whenaliasesaredeclared,requestorscanusethemintheirqueries.

Wecanusealiasestolocalizefieldnames:

f.artista.qf=artist//Italianuserswillseean"artista"field

f.kunstler.qf=artist//forGermanusers

Wecanalsousethemtocreatemetafieldsthatgroupasetofrealfields:

f.people.qf=author,illustrator,editor,translator

f.titles.qf=title,front_cover_title,sub_title,uniform_title

PhrasebigramandtrigramfieldsOtherthansupportingthepfparameterwehavealreadyseenforDisMax,thisparseraddstwooptionalfeatures.Thepsparameterbooststhescoreofdocumentswhereinputtermsappearinproximity.Thepf2andpf3parametersofferthesamefeaturebutbysplittingtheinputtermsinconsecutivebigramsandtrigrams,respectively.Therefore,theAllthethingsyouareinputstringwillbecomethefollowingsetof(consecutive)bigrams:

Allthe,thethings,thingsyou,youare

www.it-ebooks.info

Forthesamelogic,itwillbecomethefollowingsetoftrigrams:

Allthethings,thethingsyou,thingsyouare

PhrasebigramandtrigramslopAspssetsthephraseslopforthepfparameter,ps2andps3dothesameforpf2andpf3.Iftheyareabsent,thevalueofpsisused.

MultiplicativeboostfunctionTheboostparameterdeclaresonefunctionasthebfparameter,aswehaveseenfortheDisMaxparser.Thedifferencehereisthatthefunctionvalueismultiplied(notadded)bythecomputedscore.

UserfieldsTheufparameterspecifieswhichfields(realorvirtual)therequestorsareallowedtouseintheirqueries.Usedinconjunctionwithaliasing,itallowsyoutocompletelyhiderealfieldsandhavequerieswithonlyvirtual(thatis,aliased)fields.

LowercaseoperatorsInplainSolrqueryparsersyntax,operatorsneedtobeinuppercase(AND,OR).ThelowercaseOperatorsflagparameter,whichdefaultstotrue,allowsustointerpretasoperatorslowercasetokens(and,or).

NoteAtthetimeofwritingthisbook,onlytheandandorBooleanoperatorsareaffectedbythisparameter.TheNOToperatorisnothandled,andtherefore,thelowercasewordnotisparsedasaliteralterm,eveniflowercaseOperatorsissettotrue.TheJiraissueathttps://issues.apache.org/jira/browse/SOLR-3580trackstheactivityonthistopic.

www.it-ebooks.info

OtheravailableparsersTherearealotofotheravailableparsers,aslistedinthefollowingtable:

Parser Code Description

Lucenequeryparser

luceneTheLucenequeryparserhasmoreorlessthesamefeaturesastheSolrqueryparser.However,thisistheLucene-specificimplementation.

Functionqueryparser

func Createsafunctionqueryfromtheinputstring.

Joinqueryparser

join Normalizesrelationshipsbetweendocumentsbyemulatingajoin.

Termqueryparser

term Createsasingle-termqueryfromtheinputstring.

Boostqueryparser

boostCreatesaboostedqueryfromtheinputstring.Anadditionalparameter,b,isrequiredtoindicatetheboostfunction.

Rawqueryparser

raw Createsatermqueryfromtheinputstringwithoutanytextanalysis.

Spatialfilterqueryparser

geofilt Enablesspatialqueries.

Fieldqueryparser

field Createafieldqueryfromtheinputstring.

Surroundqueryparser

surround Createsasurroundquery.Thisqueryisusedforproximitysearches.

Besidesallofthis,thequeryparserframeworkhasbeenconceivedwithextensibilityinmind,sodevelopersarefreetoimplement,register,andusetheirownqueryparsers.

www.it-ebooks.info

www.it-ebooks.info

SearchcomponentsAsearchcomponentisareusablemodulethatcontributestosearchresults.Whiledefiningasearchhandler,thatis,acontrollerforagivenkindofsearch,youcancustomizeitsbehaviorbydefiningandconfiguringsearchcomponentsthatwillcontributetoitsoutputresults.

Searchcomponentsmustbedeclaredandusedwithinsolrconfig.xml,themainSolrconfigurationfile.Acomponentdeclarationrequiresaname,theimplementationclass,andasetofoptionalinitializationparameters:

<searchComponentname="prices"class="a.b.c.MyComponent">

<strname="ds-jndi">jdbc/datasource</str>

<strname="service-uri">http://example.org#me</str>

</searchComponent>

Oncedeclared,thesecanbeusedwithinrequesthandlers,whicharetheruntimecontrollersoftheexecutionsofrequests(wewillcoverrequesthandlerslaterinthechapter).

Therearesomepredefinedsearchcomponentsthatmustn’tbeexplicitlydeclaredinsolrconfig.xml.

NoteThatdoesn’tmeantheyareautomaticallyenabled.Theymustbeexplicitlyactivatedordisabled,dependingontheirdefaultstate.

Thedefaultcomponentsarethosecomponentsthatareresponsibleforabsolvingthefundamentalorcommonstepsofaqueryexecutionflow.Thisisthereasonthere’snoneedtodeclarethemexplicitly,unlessyouwanttouseadifferentconfiguration.Inthefollowingsections,wewillillustratethesecomponents.

www.it-ebooks.info

QueryThequerycomponentisresponsibleforparsingandexecutingaquery.Thisisthecomponentthatacceptsqueryandqueryparserparameters,getsareferencetotheappropriatequeryparser,coordinatestheparserinordertoproduceaquery,executesthatquery,andoutputsacorrespondingresponse.

www.it-ebooks.info

FacetThiscomponentenablestheso-calledfacetedsearch.Itcontributestosearchresultsbyaddingasetofconfigurableaggregationscalledfacets.

Whenyouexecutesomesearch,youwillgetbackasinglepageofresultsconsistingofacertainnumberofmatchingdocuments.Enablingfacetingallowsyoutogetanadditionalperspectiveoftheoveralldata,consistingofasetofaggregations.ThefollowingscreenshotshowssomeSolr-poweredfacetsinactiononawebsite,ontherightside:

Thefacetcomponentcanbeactivatedbyspecifyingafacetparameterwithoneofthefollowingvalues:yes,true,oron.

Solrprovidesseveraltypesoffacets:queries,fields,ranges,pivot,andinterval.Eachofthem,wheneverenabled,willaddadedicatedsectiontotheresponse.

FacetqueriesThefacet.queryparameterdeclaresaquery(parsedbytheSolrqueryparser)thatwillbeusedasafacetwiththecorrespondingcounts.Theresults(thatis,counts)ofthisfacetwillbeinaspecificresponsesectioncalledfacets_queries.Theparametercanberepeatedmultipletimes,allowingustospecifyseveralqueries.Usingtheexampledataset,withSolrrunning,openabrowserandtypehttp://127.0.0.1:8983/solr/example/select?q=*:*&facet=true&facet.query=genre:jazz

IntheXMLresponse,youwillseematchingdocumentswithinthe<result>tag,andanadditionalsectiondedicatedtofacets:

<lstname="facet_counts">

<lstname="facet_queries">

<intname="genre:Jazz">3</int>

</lst>

<lstname="facet_fields"/>

<lstname="facet_dates"/>

<lstname="facet_ranges"/>

</lst>

Here,youcanseethatthreedocumentsmatchthefacetquery.Theotherfacetsectionsare

www.it-ebooks.info

emptybecausewedidn’taskforthem.

FacetfieldsFacetfieldsaresurelythemostpopularkindoffacets.Theyaggregatesearchresultsusingasetofgivenandconfigurablefields.

NoteRememberthatafieldmustbedeclaredasindexedintheschemainordertobefaceted.

Otherthanactivatingthefacetfeatureforagivenfield,Solrhasarichsetofparametersthatcanbeusedtotuneandconfigurethefield’sfacetingbehavior.Thesesettingscanbespecifiedforallfieldsorforagivenfield.Forthefirstcase,thefollowingtableillustratestheavailableparameters,theirnames,andmeanings.Forfield-specificsettings,thesameparametersmustbedeclaredwiththefollowingconvention:

f.<field>.<parameter>=<value>

Inthisway,thevalueassociatedwithparameterwillbevalidonlyforthespecificfield.

Parameter Description

facet.field Declaresafieldthatwillbeusedasafacet.Thisparametermustberepeatedforeachfacetfield.

facet.prefix Limitsthetermsusedinfacetingtovaluesthatbeginwithagivenprefix.

facet.sortThesortstrategyofcountswithineachfacet.Onlytwovaluesareallowed:count,whichmeansorderbycount,andindex,whichmeanslexicographicorder.

facet.limitThemaximumnumberofcountsthatcanbereturnedforeachfacet.Avalueof-1willreturnallavailablecounts.

facet.offset Specifiesastartoffsetwithintheavailablecountsoffacets.

facet.mincount Theminimumcountneededforafieldtobeincludedintheresponse.

facet.missingIncludesintheresponsethecountofdocumentsthatmatchthequerybutdon’thaveavalueforagivenfacet.

facet.method ThetypeofalgorithmthatSolrwillusetocomputefacets.

facet.threads Thenumberofparallelworkers(thatis,threads)thatwillcomputethefacets.

Returningtoourpreviousexample,let’sremovethefacetqueryandusesomeadditionalparameterssothatfacetfieldswillbebuilt(forsimplicity,onlythequerystringisreported):

q=*:*&facet=on&facet.field=genre&facet.minCount=1

Inthefacetsections,youwillseethegenrefacetsunderthefacet_fieldssubsection:

<lstname="facet_fields">

<lstname="genre">

www.it-ebooks.info

<intname="ProgressiveRock">10</int>

<intname="Rock">5</int><intname="Fusion">4</int>

<intname="HeavyMetal">4</int>

<intname="Popmetal">1</int></lst>

</lst>

Weaskedforthegenrefacetandwesetmincountto1,whichmeansthatfacetswithnocountsareexcludedfromtheresponse.Itisimportanttounderlinethefactthatthedisplayedvalueforafacetfieldisitsindexedvalue,andnotthestoredvalue(thatis,thevaluethatiscopiedverbatimasitarrivesininputdocuments).Inthepreviousexample,thegenrefieldisString,andtherefore,itisnottokenized.Thisisthereasonyouseethecompoundterm(ProgressiveRock)asoneofitsvalues.IfthatfieldhadbeendeclaredasTextFieldandtokenizedwithWhiteSpaceTokenizer,youwouldhaveseentwodifferentvaluesforthatfacet(assumingnofurtherfiltering):ProgressiveandRock.

FacetrangesFacetrangescanbeappliedtonumericordatefields.Asthenamesuggests,withfacetranges,Solrcreatesafacetclassificationbasedonranges.Thefollowingparameterscontrolthiskindoffaceting:

Parameter Description

facet.rangeDeclaresafieldthatwillbeusedasthefacetrange.Theparametermustberepeatedforeachfacetfield.

facet.range.start Declaresthestartofthefacetinterval.

facet.range.end Declarestheendofthefacetinterval.

facet.range.gap Thesizeofeachstepbetweenthestartandtheendoftheinterval.

Thefollowingisasamplequerythatusesfacetrangesforfacetingalbumsbyreleasedate:

q=*:*&facet=on&facet.range=released&facet.range.start=1950&facet.range.end=

2000&facet.range.gap=10

Thatwilladdanothersectionwithinthefacet_countselement:

<lstname="facet_ranges">

<lstname="released">

<lstname="counts">

<intname="1950">1</int>

<intname="1960">1</int>

<intname="1970">6</int>

<intname="1980">8</int>

<intname="1990">5</int>

</lst>

</lst>

</lst>

Pivotfacets

www.it-ebooks.info

Wepreviouslydescribedfacetfields;theyprovidetheabilitytoaggregatesearchresultsbyoneormorecategories.Pivotfacetsgoastepaheadinthatdirection.Theyallowustoanalyzedatainmultipledimensions,breakingdownthefacetedvaluesbysubsequent,nestedsubcategories.

Thiskindoffacetingcanbeactivatedthrougharequestlikethis:

q=*:*&facet=on&facet=true&facet.pivot=genre,released

Thefacet.pivotparametercanberepeatedmultipletimes.Foreachrepetition,therewillbeadedicatedandaggregatedresultwithinthefacet_pivotsectionoftheresponse.Here,forsimplicity,weputjustoneparameterwithtwocategories,genreandreleased.Thefollowingexampleisanextractoftheresponseyouwillgetusingthesampleinstanceassociatedwiththischapter:

<lstname="facet_pivot">

<arrname="genre,released">

<lst>

<strname="field">genre</str>

<strname="value">ProgressiveRock</str>

<intname="count">10</int>

<arrname="pivot">

<lst>

<strname="field">released</str>

<intname="value">1992</int>

<intname="count">2</int>

</lst>

<lst>

<strname="field">released</str>

<intname="value">1969</int>

<intname="count">1</int>

</lst>

<lst>

<strname="field">genre</str>

<strname="value">Rock</str>

<intname="count">5</int>

<arrname="pivot">

<lst>

<strname="field">released</str>

<intname="value">1969</int>

<intname="count">1</int>

</lst>

<lst>

<strname="field">released</str>

<intname="value">1986</int>

<intname="count">1</int>

</lst>

Asyoucansee,thegenrefacetisbrokendownbyanestedreleasedcategory.Notethattheprecedingnestedstructureisreturnedwithjustonerequest-responseinteraction.Inordertogetthesameresultwithclassicfacetfields,youshouldquerySolrseveraltimeswithincrementalfilters.That’sthereasonthepivotfacetsfeature,actingasafaçadeandhidingallofthatinteractioncomplexity,isveryusefulfornavigatingthehierarchyof

www.it-ebooks.info

thoseaggregations.However,itshouldbeusedcarefully,asitcouldhaveanimpactonperformance.

IntervalfacetsIntervalfacetswereintroducedinSolr4.10.Theycanbeseenasanalternativetofacet(range)queriesbecausetheyallowyoutosetintervalcriteriaforoneormorefields,andcountthenumberofmatchingdocumentsthathavevalueswithinthoseconstraints.

Althoughthesameresultcanbeachievedwithfacetrangequeries,thisimplementationcouldprovideperformanceimprovementinseveralcontexts.AssuggestedintheSolrreferenceguide,itisrecommendedthatyoutryboththemethods.

www.it-ebooks.info

HighlightingThehighlightcomponentcontributestosearchresultsbyaddingasectionthatcontains(foreachdocumentinthecurrentresultpage)asetofsnippetshighlightingthesearchtermsthatareinthedocumentcontent(thatis,inoneormorefieldsofthedocument).Thefollowingscreenshotshowsawebapplicationthatusesthehighlightingfeature:

ThisfeatureisparticularlyusefulwhenyourdatacomesfromrichdocumentssuchasPDFsorMicrosoftOfficedocuments(asshownintheprecedingexample).Usingthehighlightingfeature,it’spossibletogivetheenduseranapproximateideaofthecontextwhere,withinthedocument,enteredtermshavebeenfound.

TipWithintheexampleSolrinstanceassociatedwiththischapter,thereisarequesthandlercalled/highlightthatenablesthisfeatureontitleandartistfields.

Thehighlightingcomponentcanbetuned,orconfigured,withseveralparameters.

www.it-ebooks.info

Fortunately,theprovideddefaultvaluesworkwellinmanyscenarios.Someofthoseparametersaredescribedinthefollowingtable:

Parameter Description

hl Turnshighlightingofforon.Thedefaultvalueisfalse.

hl.qTermstobehighlightedaretakenfromthemainqueryunlessthisparameter,whichitselfrequiresaquery,isspecified.

hl.flAspace-orcomma-separatedlistoffieldsthatwillbeusedforhighlighting.Snippetswillcomeonlyfromthesefields.

hl.snippets Thenumberofhighlightingsnippetsthatwillbereturned.Thedefaultvalueis1.

hl.maxAnalyzedCharThemaximumnumberofcharactersthatwillbeinspected(inagivenfield)tocomputethesnippets.

hl.simple.pre/hl.simple.postIndicatestextthatshouldappearbeforeandafterahighlightedterm.Theydefaultto<em>and</em>HTMLtags,respectively.

Solrcomeswiththreedifferentkindofhighlighters,describedinthefollowingsections.

StandardhighlighterThisisthefirsthighlighterthatwasintroducedinSolr.Solrusesitbydefault.Itisabletoworkontopofalotofquerytypesanddoesn’thaveanyspecialrequirementonfieldstobehighlighted.However,inordertospeedupitswork,termVectorsshouldbeturnedon(forthosefields).

FastvectorhighlighterFastvectorhighlighteristhesecondtypeofhighlighterintroducedinSolr.ItrequiresthattermVectors,termPositions,andtermOffsetsareturnedonforeachfieldthatneedstobehighlighted.Thatallowsfastandscalableexecution,especiallywithdocumentscontaininglargeamountsoftext,butrequiresalotofextraspacefortheindex.However,itsupportsfewquerytypes.

Thefastvectorhighlightercanbeenabledbysettingthehl.useFastVectorHighlighterparametertotrue.

Notethat,iftheprecedingflagsarenotsetfortargetfields,SolrwillcontinuetouseStandardHighlighter.

PostingshighlighterThishighlighterdoesn’tusetermvectors,nordoesitreanalyzethetexttobehighlighted.ItonlyrequiresthestoreOffsetsWithPositionsflagsetforthefieldstobehighlighted.Unliketheothers,thishighlightermustbeexplicitlydeclaredinthesolrconfig.xmlfilewiththefollowingdeclaration:

<searchComponentclass="solr.HighlightComponent"name="highlight">

<highlightingclass="org.apache.solr.highlight.PostingsSolrHighlighter"/>

www.it-ebooks.info

</searchComponent>

Thisisagoodcompromise,comparedwiththefirsttwohighlighters,intermsofperformanceandindexspace.Theinformation(thatis,thepostingoffsets)requiredbythestoreOffsetsWithPositionsflagischeaperthantermvectorsintermsofmemoryanddiskoccupation.However,itissupposedtobeusedtohighlightsimplequeryterms,soitcouldhavesomeunexpectedorunwantedresultswithphrasequeries.

www.it-ebooks.info

MorelikethisThemorelikethissearchcomponentallowsustofinddocumentsthathavesomekindofsimilaritywithagivendocument.ThereareseveralwaystousethisfeatureinSolr:

MoreLikeThisHandler:Thisisafrontcontrollerthatiscompletelydedicatedto“morelikethis”requests.Itacceptsaquerythatidentifiesadocument,andlooksforsimilardocumentsaccordingtoaconfiguredcriterion.MoreLikeThisHandler:ThisissimilartoMoreLikeThisHandler,butinsteadoftakingadocumentastheinput(matchedbyagivenquery),thetextusedtocomputesimilaritycanbedirectlypassedorfetchedfromaURL.MoreLikeThisSearchComponent:Asasearchcomponent,itwillexecutethesimilarsearchforeachdocumentofthecurrentresultpage,thusappendingamorelikethissectiontotheSolrresponse,withalistofsimilardocumentsforeachdocument.Thisisnotreallyrecommendedbecauseitcouldslowdownoverallqueryexecution.

Ingeneral,thefirsttypeisthemostwidelyused.MoreLikeThisdoesn’thavespecialrequirementsforfieldsthataretobeusedforthesimilaritycomputation.However,forbestperformance,TermVectorsshouldbeenabledforthem.

Thefollowingtableillustratestheparametersacceptedbythiscomponent:

Parameter Description

mlt Turnshighlightingofforon.Itdefaultstofalse.

mlt.count Themaximumnumberofsimilardocumentsthatmustbereturned(foreachdocument).

mlt.flThefieldsusedforsimilarity.TheyshouldhaveTermVectorsenabled(recommended)ortheyneedtobestored.

mlt.qfAlistofspace-orcomma-separatedfields(alreadydeclaredinmlt.fl)withcorrespondingboosts.

mlt.minwl/

mlt.maxwl

Theminimumandmaximumwordlengthboundaries.Wordswhoselengthismorethattheseboundariesareignored.

mlt.boostAflagindicatingwhetherthequerywillbeboostedbytherelevanceoftheinterestingterms.Itdefaultstofalse.

mlt.mintf Thisistheminimumtermfrequencyboundary.Itdefaultsto2.

mlt.mindf Thisistheminimumdocumentfrequencyboundary.Itdefaultsto5.

www.it-ebooks.info

OthercomponentsOtherthanthecomponentswesawintheprevioussections,thereareotherbuilt-insearchcomponentsthatarepartoftheSolrframework.Rememberthat,ifyouwanttousethem,theywillhavetobeexplicitlydeclaredandconfiguredwithintheSolrconfiguration.

Thefollowingisashortandnon-exhaustivelistofadditionalcomponents:

Queryelevation:ThisisusedtogivemoreimportancetosomeresultsusingacriterionthathasnothingtodowiththenormalSolrscoringalgorithm.Thecomponentletsyouassociateagivenquerywithacorrespondinglistofmostimportantresults.Terms:ThisprovidesaccesstotheLuceneinternaltermdictionary.Stats:Thisprovidesnumericfieldsstatistics.Spellcheck:Thisprovidesspellcheckingcapabilitiesbymeansofn-gramanalysisofindexeddocumentsorexternaldictionaries.Fromafunctionalpointofview,thiscomponentisusedtobuildtheso-called“Didyoumean?”feature,offeringalternativesearchsuggestionsincaseofusermistakes.TermVector:Thisaddstermvectors(thatis,term,frequency,position,offset,andIDF)ofthematchingdocumentstoarequest.Debug:Thisaddsdebugingandexplanatoryinformationabouttherequestexecution.

www.it-ebooks.info

www.it-ebooks.info

SearchhandlerWesawrequesthandlersinthepreviouschapter.There,wedefinedarequesthandlerasapluggablecomponentthathandlesincomingrequests.Inthatchapter,wewerereferringtoupdaterequests,thatis,requestscontainingindexupdatecommands.

Here,wewillfocusourattentiononSearchHandler,aspecialfrontcontrollerusedtohandleincomingsearchrequests.TheSearchHandlerclass,althoughitcouldbeseenasthesupertypelayerofallsearchhandlers,isnotabstractanditdefinesastandardsearchbehavior.

www.it-ebooks.info

StandardrequesthandlerStandardRequestHandlerisanemptysubclassofSearchHanlder,soatthetimeofwritingthisbook,usingoneofthemisbasicallythesame.Requesthandlersaredeclaredinthesolrconfig.xmlfile,andtheydefinesearchendpoints.Eachinstanceisassociatedwithagivennameprefixedbyaslash(thenamemustbeunique),animplementationclass,andasetofconfigurationparameters:

<requestHandlername="/mySeacher"class="solr.SearchHandler">

(configuration)

</requestHandler>

WiththesampleSolrinstancerunning,theprecedinghandlerwillanswertooneoftheseURIs:http://localhost:8983/solr/example/query

http://localhost:8983/solr/example/facets

http://localhost:8983/solr/example/jazz

ConfiguringaSearchHandlerinstancemeansdefiningconfigurationparametersand(optionally)searchcomponentsthatwillparticipateinthequeryexecutionchain.

SearchcomponentsMostofthetime,unlessyouhaveaspecificneed,thesearchcomponentsthatdrivethelogicofthesearchexecutioncanbeomittedbecausethefollowinglistwillbeautomaticallyinjected:

Code Component

query QueryComponent

facet FacetComponent

mlt MoreLikeThisComponent

highlight HighlightComponent

stats StatsComponent

debug DebugComponent

Onlythe“query”componentisenabled;theothersneedtobeexplicitlyactivated.

Ifthedefaultchainisnotwhatyouneed,itispossibletodefineacustomchaininthefollowingway:

<arrname="components">

<str>query</str>

<str>facet</str>

…othercomponentsfollow

</arr>

www.it-ebooks.info

Thiswillcompletelyreplacethedefaultchain.Itisalsopossibletoleavethedefaultchainasitisandhaveadditionalprependedorappendedcomponents:

<arrname="first-components">

<str>my_custom_component</str>

…othercomponentsfollow

</arr>

<arrname="last-components">

<str>another_custom_component</str>

…othercomponentsfollow

</arr>

So,ingeneral,theorderofexecutionforsearchcomponentswillbethefollowing:

Componentsdeclaredas“first-components”(optional).Componentsdeclaredas“components”Intheirabsence,thedefaultchainwillbeused.Componentsdeclaredas“last-components”(optional).

ThefollowingisanexampledeclarationofStandardRequestHandler:

<requestHandlername="/jazz"class="solr.StandardRequestHandler">

<!--parametersthatwillbealwaysappliedtotheincomingrequests-->

<lstname="invariants">

<intname="rows">10</int>

</lst>

<!--parametersthatwillbealwaysaddedtotheincomingrequests-->

<lstname="appends">

<intname="fq">genre:jazz</int>

</lst>

<!--defaultsettingsthatcanbeoverriddenbytheincomingrequests-->

<lstname="defaults">

<strname="sort">titleasc</str>

<strname="echoParams">explicit</str>

<strname="q">*:*</str>

<boolname="facet">false</bool>

</lst>

<!—Thisisacustomsearchcomponentthatwillrunafterthedefault

componentchain-->

<arrname="last-components">

<str>prices</str>

</arr>

</requestHandler>

QueryparametersTherequesthandlersandthesearchcomponentsinvolvedinthechainacceptseveralparameterstodrivetheirexecutionlogic.Theseparameters(withcorrespondingvalues)canbedeclaredinthreedifferentsections:

defaults:Parametervalueswillbeusedunlessoverriddenbyincomingrequests

www.it-ebooks.info

appends:Parametervalueswillappendedtoeachrequestinvariants:Parametervalueswillbealwaysbeappliedandcannotbeoverriddenbyincomingrequestsorbythevaluesdeclaredindefaultsandappendsections

Allsectionsareoptional,soyoucanhavenoparametersconfiguredforagivenhandlerandallowtheincomingrequeststodefinethem.Thisisanexampleofahandlerconfiguration:

<lstname="defaults">

<strname="defType">edismax</str>

</lst>

<lstname="appends">

<strname="facet.field">artist</str>

<strname="facet">genre</str>

</lst>

<lstname="invariants">

<strname="wt">json</str>

<boolname="facet">true</bool>

</lst>

www.it-ebooks.info

RealTimeGetHandlerRealTimeGetHandlerisbasicallyaSearchHandlersubclassthataddsRealTimeSearchComponenttothesearchrequestexecution.Inthisway,it’spossibletoretrievethelatestversionofsoftlycommitteddocumentsbyspecifyingtheiridentifiers.

Inordertoenablesuchacomponent,youmustturntheupdatelogfeatureon,insolrconfig.xml:

<updateHandlerclass="solr.DirectUpdateHandler2">

<updateLog>

<strname="dir">${solr.ulog.dir:}</str>

</updateLog>

</updateHandler>

Thentherequesthandlercanbedeclaredandconfiguredusingtheprocedurethatwesawintheprevioussection:

<requestHandlername="/get"class="solr.RealTimeGetHandler">

</requestHandler>

Thishandleracceptsanadditionalidoridsparameterthatallowsustospecifytheidentifiersofthedocumentswewanttoretrieve.Theidparameteracceptsoneidentifierandcanberepeatedinrequests.Theidsparameteracceptsacomma-separatedlistofidentifiers.

TipOncetheexampleSolrinstanceisup,thishandlerrespondsto/getrequests.

www.it-ebooks.info

www.it-ebooks.info

ResponseoutputwritersAsalaststep,queryresultsarereturnedtorequestorsinagivenformat.SolrcommunicateswithclientsusingtheHTTPprotocol.Thoseclientsarefreetostarttheinteractionbyaskingforoneformatoranother,dependingontheirneeds.

Althoughadefaultformatcanbeset,theclientcanoverrideitbymeansofthewtparameter.Thevalueofthewtparameterisamnemoniccodeassociatedwithanavailableresponsewriter.

Thereareseveralbuilt-inresponsewritersinSolr,whicharedescribedhere:

ResponseWriter Description

xml TheeXtensibleMarkupLanguageresponsewriter.Thisisthedefaultwriter.

xslt CombinestheXMLresultswithanXSLTfileinordertoproducecustomXMLdocuments.

json JavaScriptObjectNotationresponsewriter.

csv Comma-SeparatedValueresponsewriter.

velocityThisusesApacheVelocitytodirectlybuildwebpageswithqueryresults.Itisveryusefulforfastprototyping.

javabinJavaclientshaveaprivilegedwaytoobtainresultsfromSolrusingthisresponsewriter,whichdirectlyoutputsJavaObjects.

python,ruby,php

Specializedresponsewritersfortheselanguagesthatproduceastructuredirectlytiedtothelanguagerequirements.

www.it-ebooks.info

www.it-ebooks.info

ExtendingSolrThefollowingsectionswilldescribeandillustrateacoupleofwaysofextending,andcustomizingsearchesinSolr.

www.it-ebooks.info

Mixingreal-timeandindexeddataSometimes,asapartofyoursearchresults,youmaywanttohavedatathatisnotmanagedbySolrbutretrievedfromareal-timesource,suchasadatabase.

Thinkofane-commerceapplication;whenyousearchforsomething,youwillseetwopiecesofinformationbesideeachitem:

Price:Thiscouldbetheresultofsomekindoffrequentlyupdatedmarketingpolicy.Non-real-timeinformationcouldcauseproblemonthevendorside(forexample,awrongpricepolicycouldbeapplied).Availability:Here,wronginformationcouldcauseaninvalidclaimfromcustomers;forexample,“IboughtthatbookbecauseIsawitasavailable,butitisn’t!”

Thisisagoodscenariofordevelopingasearchcomponent.WewillcreateoursearchcomponentandassociateitwithagivenRequestHandler.

Asearchcomponentisbasicallyaclassthatextends(notsurprisingly)org.apache.solr.handler.component.SearchComponent:

publicclassRealTimePriceComponentextendsSearchComponent

Theinitializationofthecomponentisdoneinamethodcalledinit.Here,mostprobablywewillgettheJNDInameofthetargetdatasourcefromtheconfiguration.Thissourceiswherethepricesmustberetrievedfrom:

publicvoidinit(NamedListargs){

StringdsName=SolrParams.toSolrParams(args).get("ds-name");

Contextctx=newInitialContext();

this.datasource=(DataSource)ctx.lookup(dName);

}

Nowwearereadytoprocesstheincomingrequests.Thisisdoneintheprocessmethod,whichreceivesaResponseBuilderinstance,theobjectwewillusetoaddthecomponentcontributiontothesearchoutput.Sincethiscomponentwillrunafterthequerycomponent,itwillfindalistcontainingqueryresultsinResponseBuilder.Foreachitemwithinthoseresults,ourcomponentwillquerythedatabaseinordertofindacorrespondingprice:

publicvoidprocess(ResponseBuilderbuilder)throwsIOException{

SolrIndexSearchersearcher=builder.req.getSearcher();

//holdsthecomponentcontribution

NamedListcontrib=newSimpleOrderedMap();

for(DocIteratorit=builder.getResults().docList.iterator();

iterator.hasNext();){

//ThisistheLuceneinternaldocumentid

intdocId=iterator.nextDoc();

Documentldoc=searcher.doc(docId,fieldset);

//ThisistheSolrdocumentId

Stringid=ldoc.get("id");

www.it-ebooks.info

//Getthepriceoftheitem

BigDecimalprice=getPrice(id);

//Addthepriceoftheitemtothecomponentcontribution

result.add(id,price);

}

//Addthecomponentcontributiontotheresponsebuilder

builder.rsp.add("prices",result);

}

Insolrconfig.xml,wemustdeclarethecomponentintwoplaces.First,wemustdeclareandconfigureitinthefollowingmanner:

<searchComponentname="prices"class="a.b.c.RealTimePriceComponent">

<strname="ds-name">jdbc/prices</str>

</searchComponent>

Thenithastobeenabledinrequesthandlers(asshowninthefollowingsnippet).Sincethiscomponentissupposedtocontributetoasetofqueryresults,itmustbeplacedafterthequerycomponent:

<requestHandlername="/xyz"…>

<arrname="last-components">

<str>prices</str>

</arr>

</requestHandler>

Done!Ifyourunaqueryinvokingthe/xyzrequesthandleryouwillseeafterqueryresultanewsectioncalledprices(thenameweusedforthesearchcomponent).Thisreportsthedocumentidandthecorrespondingpriceforeachdocumentinthesearchresults.

TipYoucanfindthesourcecodeoftheentireexampleinthesrcfolderoftheprojectassociatedwiththischapter,undertheorg.gazzax.labs.solr.ase.ch3.sppackage.

IfyouwanttostartSolrwiththatcomponent,justrunthefollowingcommandfromthecommandlineorfromEclipse:

mvncleaninstallcargo:run–Pcustom-search-component

www.it-ebooks.info

UsingacustomresponsewriterInaprojectIwasworkingon,weimplementedtheautocompletefeature,thatis,alistofsuggestionsthatquicklyappearsunderthesearchfieldeachtimeausertypesakey.Thus,thesearchstringisgraduallycomposed.Thefollowingscreenshotshowsthisfeature:

Anewresponsewriterwasimplementedbecausetheuserinterfacewidgethadalreadybeenbuiltbyanothercompany,andtheexchangeformatbetweenthatwidgetandthesearchservicehadbeenalreadydefined.

DoingthatinSolrisveryeasy.Aresponsewriterisaclassthatextendsorg.apache.solr.response.QueryResponseWriter.LikeallSolrcomponents,itcanbeoptionallyinitializedusinganinitcallbackmethod,anditprovidesawritemethodwheretheresponseshouldbeserializedaccordingtoagivenformat:

publicvoidwrite(

Writerwriter,

SolrQueryRequestrequest,

SolrQueryResponseresponse)throwsIOException{

//1.Getareferencetovaluesthatcompoundthecurrentresponse

NamedListelements=response.getValues();

//2.UseaStringBuildertobuildtheoutput

StringBuilderbuilder=newStringBuilder("{")

.append("query:'")

.append(request.getParams().get(CommonParams.Q))

.append("',");

//3.Getareferencetotheobjectwhich

//holdthequeryresult

Objectvalue=elements.getVal(1);

if(valueinstanceofResultContext)

{

ResultContextcontext=(ResultContext)value;

//Theorderedlist(actuallythepagesubset)

//ofmatcheddocuments

www.it-ebooks.info

DocListids=context.docs;

if(ids!=null)

{

SolrIndexSearchersearcher=request.getSearcher();

DocIteratoriterator=ids.iterator();

builder.append("suggestions:[");

//4.Iterateoverdocuments

for(inti=0;i<ids.size();i++)

{

//5.Foreachdocumentweneedtogetthe"label"attr

Documentdocument=searcher.doc(iterator.nextDoc(),FIELDS);

if(i>0){builder.append(",");}

//6.Appendthelabelvaluetowriteroutput

builder

.append("'")

.append(((String)document.get("label")))

.append("'");

}

builder.append("]").append("}");

}

}

//7.andfinallywriteouttheresult.

writer.write(builder.toString());

}

That’sall!Nowtryissuingaquerylikethis:http://127.0.0.1:8983/solr/example/auto?q=ma

Solrwillreturnthefollowingresponse:

{

query:'ma',

suggestions:['MarcusMiller','MichaelManring','Gotamatch','Nigerian

Marketplace','TheCryingmachine']

}

TipYoucanfindthesourcecodeoftheentireexampleundertheorg.gazzax.labs.solr.ase.ch3.rwpackageofthesourcefolderintheprojectassociatedwiththischapter.

IfyouwanttostartSolrwiththatwriter,runthefollowingcommandfromthecommandlineorfromEclipse:

mvncleaninstallcargo:run–Pcustom-response-writer

www.it-ebooks.info

www.it-ebooks.info

TroubleshootingThissectionwillprovidehelp,tips,andsuggestionsaboutdifficultiesthatyoucouldmeetwhileyou’reexperimentingwithwhatwedescribedinthischapter.

www.it-ebooks.info

Queriesdon’tmatchexpecteddocumentsThere’snosingleanswertothisbigandpopularquestion.Withoutanyadditionalinformation,thefirsttwothingsIwoulddoareasfollows:

Retrythequerybyappendingdebugparameters(forexample,debugQueryandexplainOther)andanalyzetheexplainsection.There’sawonderfulonlinetool(http://explain.solr.pl)thatmakeslifeeasybyexplainingdebuginformation.Usethefieldanalysispage,typesomesamplevalues,andseewhathappensatindexandquerytime.Probably,youranalyzerchainsarenotconsistent.

www.it-ebooks.info

MismatchbetweenindexandqueryanalyzerUsingdifferentanalyzerchainsatindexandquerytimesometimescausesproblemsbecausetokensproducedatquerytimedon’tmatch,asonewouldexpect,withtheoutputtokensatindextime.Thefieldanalysispagehelpsalotindebuggingthesesituations.Typeavalueforafieldandseewhathappensatqueryandindextime.Inaddition,thispageprovidesacheckforallhighlightingmatchesbetweenindexandquerytokens.

www.it-ebooks.info

NoscoreisreturnedinresponseThescorefieldisavirtualfieldthatmustbeexplicitlyaskedforinrequests.Avalueof*intheflparameterisnotenoughbecause*means“allrealfields.”Arequestforallrealfieldsthatalsoincludethescoremustprovideanflparameterwiththevalueof*,score.Notethatthisisvalidingeneralforallvirtualfields(forexample,functions,transformers,andsoon).

www.it-ebooks.info

www.it-ebooks.info

SummaryInthischapterwemettheSolrsearchcapabilities,ahugesetoffeaturesthatpowerupinformationretrievalonSolr.Wesawalotoftoolsusedtoimprovethesearchexperienceofclients,requestors,andlastbutnotleast,endusers.Afterexaminingtheindexingphase,youcanwellimaginethatsearchandinformationretrievalconstitutetheactualfunctionalgoalsofafull-textsearchplatform.

WemetthedifferentpiecesthatcompoundSolr’ssearchcapabilities:analyzers,tokenizers,queryparsers,searchcomponents,andoutputwriters.Forallofthem,Solrprovidesagoodsetofalternatives,alreadyimplementedandreadytouse.Forthosewhohavespecificrequirements,itisalwayspossibletocreatecustomizationsandextensions.

Inthenextchapter,keepinginmindthebigpictureofcrucialphasesinaninformationretrievalsystem,wewilltakealookatclientAPIs.TheavailablelibrariesaregreatexamplesofhowtouseSolr’sHTTPservicestoworkprogrammaticallywithitontheclientside.

www.it-ebooks.info

www.it-ebooks.info

Chapter4.ClientAPIAsearchapplicationneedstointeractwithSolrbyissuingindexandsearchrequests.AlthoughSolrexposestheseservicesthroughHTTP,workingatthat(low)levelisnotsoeasyforadeveloper.ClientAPIsarefaçadelibrariesthathidethelow-leveldetailsofclient-servercommunication.TheyallowustointeractwithSolrusingclient-nativeconstructsandstructuressuchastheso-calledPlainOldJavaObject(POJO)intheJavaprogramminglanguage.

InthischapterwewilldescribeSolrj,theofficialSolrclientJavalibrary.Wewillalsodescribethestructureandthemainclassesinvolvedinindexandsearchoperations.Thechapterwillcoverthefollowingtopics:

Solrj:theofficialJavaclientlibraryOtheravailablebindings

www.it-ebooks.info

SolrjSolrjisthenameoftheofficialSolrJavaclient.Itcompletelyabstractstheunderlying(HTTP)transportlayerandoffersasimpleinterfacetoclientapplicationstointeractwithSolr.

www.it-ebooks.info

SolrServer–theSolrfaçadeAclientlibrarynecessarilyneedsafaçadeoraproxy,thatis,anobjectrepresentingtheremoteresourcethathidesandabstractsthelow-leveldetailsofclient-serverinteraction.InSolrj,thisroleisplayedbyclassesthatimplementtheorg.apache.solr.client.solrj.SolrServerabstractclass.Atthetimeofwritingthisbook,thesearetheavailableSolrServerimplementers:

EmbeddedSolrServer:ThisconnectstoalocalSolrCorewithoutrequiringanHTTPconnection.Thisisnotrecommendedinproductionbutisdefinitelyusefulforunittestsanddevelopment.HttpSolrServer:ThisisaproxythatconnectstoaremoteSolrusinganHTTPconnection.LBHttpSolrServer:AproxythatwrapsmultipleHttpSolrServerinstancesandimplementsclient-side,round-robinloadbalancingbetweenthem.Italsoensuresitperiodicallychecksthe(running)stateofeachserver,eventuallyremovingoraddingmemberstotheround-robinlist.ConcurrentUpdateSolrServer:Thisisaproxythatusesanasynchronousqueuetobufferinputdata(thatis,documents).Onceagivenbufferthresholdisreached,dataissenttoSolrusingaconfigurablenumberofdequeuerthreads.CloudSolrServer:AproxyusedtocommunicatewithSolrCloud.

AlthoughanySolrServerimplementersmentionedpreviouslyofferthesamefunctionalities,HttpSolrServerandLBHttpSolrServerarebettersuitedforissuingqueries,whileConcurrentUpdateSolrServerisrecommendedforupdaterequests.

TipThetestcase,org.gazzax.labs.solr.ase.ch3.index.SolrServersITCase,containsseveralmethodsthatdemonstratehowtoindexdatausingdifferenttypesofservers.

www.it-ebooks.info

InputandoutputdatatransferobjectsAsdescribedinthepreviouschapters,aDocumentisacentralconceptinSolr.Itrepresentsanatomicunitofinformationexchangedbetweentheclientandtheserver.TheSolrAPIseparatesinputdocumentsfromoutputdocumentsusingtheSolrInputDocumentandSolrDocumentclasses,respectively.

Althoughtheysharebasicdatatransferobjectbehavior,eachofthemhasitsownspecificfeaturesassociatedwiththedirectionofinteractionbetweentheclientandtheserverwheretheyaresupposedtoplay.

SolrInputDocumentisawriteobject.Youcanadd,change,andremovefieldsinit.Youcanalsosetaname,value,andoptionalboostforeachofthem:

publicvoidaddField(Stringname,Objectvalue)

publicvoidaddField(Stringname,Objectvalue,floatboost)

publicvoidsetField(Stringname,Objectvalue)

publicvoidsetField(Stringname,Objectvalue,floatboost)

SolrDocumentistheoutputdatatransferobject,anditisprimarilyintendedasaqueryresultholder.Here,youcangetfieldvalues,fieldnames,andsoon:

publicObjectgetFieldValue(Stringname)

publicCollection<Object>getFieldValues(Stringname)

publicObjectgetFirstValue(Stringname)

WithinanUpdateRequestProcessorinstance,orwhileaddingdatatoSolr,wewilluseSolrInputDocumentinstances.InQueryResponse(thatis,theresultofaqueryexecution),wewillfindSolrDocumentinstances.

TipAlltheexamplesinthesampleprojectassociatedwiththischaptermakeextensiveuseofthesedatatransferobjects.

www.it-ebooks.info

AddsanddeletesOnceavalidreferenceofaSolrServerhasbeencreated,addingdatatoSolrisveryeasy.TheSolrServerinterfacedefinesseveralmethodstodothis:

voidadd(SolrInputDocumentdocument)

voidadd(List<SolrInputDocument>document)

SowefirstcreateoneormoreSolrInputDocumentinstancesfilledwiththeappropriatedata:

finalSolrInputDocumentdoc1=newSolrInputDocument();

doc1.setField("id",1234);

doc1.setField("title","DelicateSoundofThunder");

doc1.addField("genre","Rock");

doc1.addField("genre","ProgressiveRock");

Then,usingtheproxyinstance,wecanaddthatdata:

solrServer.add(doc1);

Finally,wecancommit:

solrServer.commit();

Wecanalsoaccumulateallthedocumentswithinalistandusethatastheargumentoftheaddmethod.

FollowingthesamelogicasdescribedinthesecondchapterforRESTservices,SolrServerprovidesthefollowingmethodstodeletedocuments:

UpdateResponsedeleteById(Stringid)

UpdateResponsedeleteById(Stringid,intcommitWithinMs)

UpdateResponsedeleteById(List<String>ids)

UpdateResponsedeleteById(List<String>ids,intcommitWithinMs)

UpdateResponsedeleteByQuery(Stringquery)

UpdateResponsedeleteByQuery(Stringquery,intcommitWithinMs)

TipTheorg.gazzax.labs.solr.ase.ch3.index.SolrServersITCasetestcasecontainsseveralmethodsthatillustratehowtoindexanddeletedata.

www.it-ebooks.info

SearchSearchingwithSolrjrequiresknowledgeof(mainly)twoclasses:org.apache.solr.client.solrj.SolrQueryandorg.apache.solr.client.solrj.response.QueryResponse.ThefirstisanobjectrepresentationofaquerythatcanbesenttoSolr.Itallowsustoinjectallparameterswedescribedinthepreviouschapter.Onewayofdoingthisisbyprovidingdedicatedmethods,suchasthese:

SolrQuerysetQuery(Stringquery)

SolrQuerysetRequestHandler(Stringqt)

SolrQueryaddSort(Stringfield,ORDERorder)

SolrQuerysetStart(Integerstart)

SolrQuerysetFacet(booleanb)

SolrQueryaddFacetField(String…fields)

SolrQuerysetHighlight(booleanb)

SolrQuerysetHighlightSnippets(intnum)

Alternatively,genericsettermethodscanbeprovided:

SolrQuerysetParam(Stringname,String…values)

SolrQuerysetParam(Stringname,booleanvalue)

NotethatalltheprecedingmethodsreturnthesameSolrQueryobject,thusallowingacallertochainmethodcalls,likethis:

SolrQueryquery=newSolrQuery()

.setQuery("CharlesMingus")

.setFacet(true)

.addFacetField("genre")

.addSort("title",Order.ASC)

.addSort("released",Order.DESC)

.setHighlighting(true);

OnceaSolrQueryhasbeenbuilt,wecanusetheappropriatemethodintheSolrServerproxytosendthequeryrequest:

QueryResponsequery(SolrParamsparams)

ThemethodreturnsaQueryResponse,whichisanobjectrepresentationoftheresponsethatSolrsentbackasaresultofthequeryexecution.Withthatobject,wecangetthelistofSolrDocumentsofthecurrentlyreturnedpage.Wecanalsogetfacetsandtheirvalues,andingeneral,wecaninspectandaccessanypartoftheresponse.

TipTheorg.gazzax.labs.solr.ase.ch3.search.SearchITCasetestcasecontainsseveralexamplesthatdemonstratehowtoquerywithSolrj.

ThefollowingisanexampleoftheuseofQueryResponse:

//Executesaqueryandgetthecorrespondingresponse

QueryResponseres=solrServer.query(aQuery);

www.it-ebooks.info

//Getstherequestexecutionelapsedtime

longelapsedTime=res.getElapsedTime();

//Getstheresults(i.e.apageofresults)

SolrDocumentListresults=res.getResults();

//Howmanytotalhitsforthisresponse

inttotalHits=results.getNumFound();

//Iteratesoverthecurrentpage

for(SolrDocumentdocument:results){

//Dosomethingwiththecurrentdocument

Stringtitle=document.getFieldValue("title");

}

//Getsthefacetfield"genre"

FacetFieldff=res.getFacetField("genre");

//Iterateoverthefacetvalues

for(Countcount:genre.getValues()){

Stringname=count.getName();//e.g.Jazz

Stringcount=count.getCount();//e.g.19

}

//TheHighlightingsectionisabitcomplicated,asthe

//valueobjectisacompositemapwherekeysarethedocumentsidentifiers

whilevaluesaremapswithhighlightedfieldsaskeyandsnippets(alist

ofsnippets)asvalues.

Map<String,Map<String,List<String>>>hl=

response.getHighlighting();

//Iteratesoverhighlightingsectio

for(Entry<String,Map<String,List<String>>docEntry:hl){

StringdocId=docEntry.getKey();

//Iteratesoverhighlightedfields

for(Entry<String,List<String>fEntry:entry.getValue()){

StringfEntry=field.getKey();

//Iteratesoversnippets

for(Stringsnippet:field.getValue()){

//Dosomethingwiththesnippet

}

}

www.it-ebooks.info

www.it-ebooks.info

OtherbindingsSolrjisaverypowerfulclientAPI,butofcourse,itisonlyavailableforJavaclients.SinceSolrservicesareexposedusingstandardHTTPprocedures,otherclientAPIimplementationshavebeencreatedforotherlanguages.Hence,itispossibletointeractwithSolrusingPython,Perl,Ruby,.NET,oryourfavoriteprogramminglanguage.

Thefollowingtablelistssomeofthem,togetherwiththeirlocation(onlySolrjisapartoftheSolrdistribution;allotherclientlibrariesareindependentprojects):

Project Language Address

sunburnt Python https://pypi.python.org/pypi/sunburnt

pysolr Python https://pypi.python.org/pypi/pysolr/3.2.0

solrcloudpy Python https://pypi.python.org/pypi/solrcloudpy

solr-ruby Ruby https://github.com/erikhatcher/solr-ruby-flare/tree/master/solr-ruby

Blacklight Ruby http://projectblacklight.org

Solarium PHP http://www.solarium-project.org/

Solr-PHP-UI PHP http://www.opensemanticsearch.org/solr-php-ui/

PECL/Solr PHP http://pecl.php.net/package/solr

Flux Clojure https://github.com/mwmitchell/flux

solr-scala-client Scala https://github.com/takezoe/solr-scala-client

SolrNet .NET https://github.com/mausch/SolrNet

Acompleteandupdatedlistofallbindingsisavailableathttps://wiki.apache.org/solr/IntegratingSolr.

www.it-ebooks.info

www.it-ebooks.info

SummaryAdistributedsearchsystem,suchasSolr,requiresremoteserviceinvocationstosendandreceivedataacrossanetwork.ClientswithoutappropriateAPIswillbeexposedtothecomplexityofdealingwithlow-leveldetailsofthecommunicationprotocol.

SinceSolrprovidesallcoreservicesthroughHTTP,alotofclientlibrarieshavebeendevelopedtohidethatcomplexity.Regardlessoftheconcretebinding,aclientlibraryencapsulatesthelow-leveldetailsofclient-servercommunicationandprovidesauniformserviceinterfaceforclients.

Inthischapter,wefocusedontheSolrclientAPIs,specificallyontheofficialJavabindingcalledSolrj,itsmainfeatures,andthemainclassesinvolvedinindexandqueryoperations.

WebrieflydescribedandlistedsomeotherpopularbindingsthathavebeendevelopedontopoftheSolrHTTPservices.

Inthenextchapter,wewillreturntotheserversidetodescribehowtofine-tuneandmanageaSolrinstance.

www.it-ebooks.info

www.it-ebooks.info

Chapter5.AdministeringandTuningSolrYoucanmanageaSolrinstallationusinganyoftheseveralsystemadministrationtoolsprovidedwithSolr.ThesystemadministrationtoolsincludetheAdministrationConsole,theRESTservices,andtheJMXAPI,withwhichyoumanageandmonitorcores,hardwareresources,runtimeconfiguration,andthehealthoftheSolrenvironmenttoensuremaximumavailabilityandperformance.

Althoughthetopicofadministrationisusuallyoutsidethescopeofadevelopersphere,mostprobablyyou,asaproviderofasolutionbasedonSolr,willneedtoknowsomethingaboutit.Specifically,youneedtoknowaboutasetoftoolsthatletyoumonitorSolr,tuneit,andinvestigatetroubles.

Throughoutthischapter,wewilluseaSolrinstancepreloadedwithsampledata.Inordertohavethatupandrunning,youshouldcheckoutthesourcecodeofthebook,gotothech5folder,andrunthis(usingEclipseorfromthecommand-line):

#mvncleaninstallcargo:run

TipThech5sampleprojecthasapreconfiguredEclipselauncherusedtorunSolr.Youcanfinditunderthesrc/dev/eclipsefolder.Justright-clickonstart-ch5-server.launchandselecttheDebugasmenuitem.

ThischapterwilldescribethemostrelevantsectionsoftheSolradministrationconsole.WewillalsoexploretheJMXAPI.Eachtimeahardwareresourceisinvolved,wewilltalkaboutit.Specifically,thischapterwillcoverthefollowingtopics:

TheSolrAdministrationConsoleUsageofhardwareresourcesJConsoleandJMX

www.it-ebooks.info

DashboardTheAdministrationConsoleisawebapplicationthatispartofSolr.YoucanaccesstheAdministrationConsolefromanymachineonthelocalnetworkthatcancommunicatewithSolr,throughawebbrowser.

Typehttp://127.0.0.1:8983/solronthewebbrowser’saddressbar.Thefirstpagethatappearsisthedashboard,asshowninthefollowingscreenshot:

ThisiswhereyoucanseegeneralinformationaboutSolr(forexample,theversion,startuptime,andsoon)andaboutitshostingenvironment(forexample,JVMversion,JVMargs,processors,physicalandJVMmemory,andfiledescriptors).

www.it-ebooks.info

PhysicalandJVMmemoryThefirstandthelastgraybarsontherightsideofthedashboardrepresentthephysicalandJVMmemory,respectively.Thefirstmeasureistheamountofthememorythatisavailableinthehostingmachine.ThesecondmeasureistheamountassignedtotheJVMatstartuptimebymeansofthe–Xmsand–Xmxoptions.

TipForacompletelistofavailableJVMoptions,seehttps://docs.oracle.com/cd/E22289_01/html/821-1274/configuring-the-default-jvm-and-java-arguments.html.

Eachbarreportsboththeavailableamountandusedamountofmemory.Asyoucanimagine,memoryisoneofthecrucialfactorsconcerningSolrperformanceandresponsetimes.

Whenwethinkaboutawebapplication,wemayconsideritasastandalonecontainerthat,forexample,readsdatafromanexternaldatabaseandshowssomedynamicpagestotheendusers.Solrisnotlikethat;itisaservice.Despiteitsweb-application-likenature,itmakesextensiveuseoflocalhardwareresourcessuchasdiskandmemory.

Memory(here,I’mreferringtotheJVMmemory)isusedbySolrforalotofthings(forexample,caches,sorting,faceting,andindexing)sounderstandingallthosemechanismsiscrucialtodeterminetherightamountofmemoryoneshouldassigntotheJVM.

NoteThere’sausefulspreadsheet(althoughwealreadymentionedthisinthefirstchapter)thatyoucanfindintheSolrsourcerepositoryathttps://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls.ItisagoodstartingpointfromwhichtoestimateRAManddiskspacerequirements.

However,aresourcethatisoftenconsideredasexternaltotheSolrdomainisthesystemmemory,thatis,theremainingmemoryavailablefortheoperatingsystemoncetheJVMmemoryhasbeendeducted.

Inanoptimalsituation,thatkindofmemoryshouldbeenoughto:

Lettheoperatingsystemmanageitsresources.AccommodatetheSolrindex.Ideally,ifitisabletocontainthewholeindex,therewon’tbeanydiskseek.

Thefirstpointisquiteobvious;anoperatingsystemneedsagivenamountofmemorytomanageitsordinarytasks.

Thesecondpointhastodowiththeso-called(OS)filesystemcache.TheJVMworksdirectlywiththememorythatwemadeavailableinthestartupcommand-linebymeansofthe–Xmsand–Xmxoptions.ThisisthememoryweareusinginourJavaapplicationtoloadobjectinstances,implementapplication-levelcaches,andsoon.

www.it-ebooks.info

However,applicationssuchasSolrthatwidelyusefilesystemresources(toloadandwriteindexfiles)alsorelyonanotherimportantpartofthememorythatisavailablefortheoperatingsystemandisusedtocachefiles.Onceafileisloaded,itscontentiskeptinmemoryuntilthesystemrequiresthatspaceforotherpurposes.Datainthisfilesystemcacheprovidesquickaccess,withoutrequiringdiskaccessesandseeks.

NoteRememberthatthistypeofmemoryhasnothingtodowiththememoryassignedtotheJVM.

Asyoucanimagine,thisaspectcandramaticallyimproveoverallperformanceinbothindex(writes)andquery(reads)phases.Inthosecaseswhereit’snotpossibletofitalloftheindexinthefilesystemcache(theindexcaneasilyreachasizethatisrelativelysmallintermsofdiskspacebutdefinitelyhugeintermsofmemory),thesystemmemoryshouldbeenoughtoallowefficientloadandunloadmanagementofthatfilesystemcache.

www.it-ebooks.info

DiskusageThedashboardpagereportsinformationabouttheswapspace,butitsaysnothingaboutdiskusage.Thisisbecausethatkindofinformationisreportedinadedicatedsectionforeverymanagedcore.Unfortunately,thereisn’tacentralpointwhereit’spossibletoseethetotaldiskspaceusedbytheinstance.

Asdescribedintheprevioussection,thediskisaresourcewidelyusedbySolr,anditsroleisfundamentalforgettingoptimalperformance.Here,wecanaddadditionalinformationbymentioningSolidStateDisks(SSD),whichareusuallyaverygoodchoiceforgettingfastreadsandwrites.Butagain,themostcriticalfactorisunderstandingandtuningthefilesystemcache;inthemostextremecases,thisentirelyavoidsdiskseeksatall.ToputitinanutshellSSDsarefast,butmemoryisbetter.

www.it-ebooks.info

FiledescriptorsThethirdbar(showninthepreviousscreenshot)showsthemaximumnumber(lightgray)andtheeffectiveopened(darkgray)filedescriptorsassociatedwiththeJavaprocessthatrunsSolr(thatis,theJavaprocessofyourservletcontainer).

ASolrindexcanbecomposedofalotoffilesthatneedtobeopenedatleastonce.Especiallyifyouhavemanycores,frequentchanges,commits,andoptimizes,theincrementalnatureofaSolrindexcanleadtoexhaustionofalltheavailablefiledescriptors.ThisisusuallythecasewhereyougetanIOException(toomanyopenfiles).

ThefirstplacewhereyoucanmanageandlimitthenumberoffilesusedbySolrisSolritself.Withinthesolrconfig.xmlfile,you’llfinda<mergeFactor>parameterinthe<indexConfig>section.Thisparameterdecideshowmanysegmentswillbemergedatatime.

TheSolr/Luceneindexiscomposedofmultiplesubindexescalledsegments.Eachsegmentisanindependentindexcomposedofseveralfiles.Whendocumentsareadded,updated,ordeleted,Solrasynchronouslypersiststhosechangesbycreatingnewsegmentsormergingexistingsegments.Thisisthereasonthetotalnumberoffilescompoundingtheindexwillnecessarilychange(itchangesgradually,followingareasonableamountofchangesappliedtoyourdataset).Hence,itneedstobemonitored.

WithamergeFactorvaluesetto10(thedefaultvalue)therewillbenomorethanninesegmentsatagivenmoment.Whenupdatethresholds(themaxBufferedDocsorramBufferSizeparameters)arereached,anewsegmentwillbecreated.IfthetotalnumberofsegmentsisequaltotheconfiguredmergeFactor,Solrwillattempttomergeallexistingsegmentsintoanewsegment.

Anotherparameterinthesolrconfig.xmlfilethathasanimpactonthenumberofopenfilesis<useCompoundFile>.Ifthisissettotrue(notethatitdefaultstofalse),Solrwillcombinethefilesthatmakeupasegmentintoasinglefile.Whilethatmayproduceabenefitintermsofopenfiledescriptors,itmayalsoleadtosomeperformanceissuesbecauseofthemonolithicnatureofthecompoundfile.

Ontopofthat,therearescenarioswherealotoffilesarethenaturalconsequenceofyourinfrastructure.Thinkofasystemwithseveralcores,forexample.Theprevioussettingsarespecifictoasinglecore,butwhatifyouhavealotofthem?

TipWhenIuseSolrforlibrarysearchservices,Iusuallycreateatleastsixcores:oneforthemainindex,onethatholdstheheadingsusedfortheautocompletionfeature,andoneforeachalphabeticalindex(forexample,authors,titles,subjects,andpublishers).Therearesomecustomerswhorequireupto50alphabeticalindexes(whichmeansupto50cores).

Insuchcases,aftercheckingoutyourapplicationandseeingthatiteffectivelyrequiresmorefiledescriptorsthanthedefault(usually1024),youmaywanttoincreasethatlimitbyusingtheulimitcommand,asfollows:

www.it-ebooks.info

#ulimit–n5000

Here,5000isthenewlimit.Notethatthiscommandrequiresrootprivilegesanditappliesthatlimitonlytothecurrentsession.Ifyouwantittobepermanent,thatvaluehastobeconfiguredinthe/etc/security/limits.confconfigurationfile.

www.it-ebooks.info

www.it-ebooks.info

LoggingTheAdministrationConsoleallowsyoutoseelogmessages(alsoavailableinalogfile)andchangethelogsettings.

Whilethefirstfeatureisusefulonlyifyoudon’thaveaccesstothelogfiles(inspectinglogfileswithUnixcommand-linetoolsisdefinitelymorepowerfulthandoingthesamewiththeAJAX-refreshedpage),managinglogsettingsisveryusefulbecauseitdoesn’trequiremanualeditsorserverrestarts.So,ifyouwanttolimitthepriorityleveloflogmessageson-the-fly,ordebugthebehaviorofacomponent,thisistherightplacetodoso.

TipAverboseloglevelcanslowdownindexoperations,soit’sbettertochecklogsettingsbeforecallingthe/updaterequesthandler.Forthesamereason,rememberthatSolrlogsallqueryrequestsattheINFOlevel.Dependingonhowmanyusersyourapplicationhas,thiscouldleadtoahugeamountoflogmessages.

www.it-ebooks.info

www.it-ebooks.info

CoreAdminTheCoreAdminsectionisacentralpointwhereyoucanmanageregisteredcores.Youcancreateanewcoreon-the-fly(assumingthatthecoreinstanceanddatadirectoriesexistonthedisk)ormanagetheexistingcoresonebyone,selectingthemfromthelistontheleft.ThefollowingscreenshotshowstheCoreAdminpageoftheSolrinstancesetupforthischapter:

Thetoptoolbarcontainsthesebuttons:

Button Description

Unload Unloadsthecore.Thecorewillberemovedafterpendingrequestsareprocessed.

Rename Changesthecorename.NotethatthischangewillaffecttheURIendpointsofthecoreservices.

Swap Swapstwoactivecores.Thisisusefulforswitchingbetweentwoversions(thatis,onlineandofflineversions)ofthesamecore.Notethatbothofthemwillstillbealiveafterissuingtheswapcommand.

ReloadReloadsacore.Thecurrentcoreinstancewillbeavailableonlyforsatisfyingpendingrequests.Thiscommandisusefulifsome(backward-compatible)changeshavebeenmadetothesolrconfig.xmlorschema.xmlconfigurationfilesorcorelibrariesandyouwanttoloadthosechanges.

www.it-ebooks.info

Optimize Issuesanoptimizecommandtotheselectedcore.

Thecentralareashowsthefollowinginformationaboutthecoreandthecorrespondingindex:

Attribute Description

startTime Thecorestart(orreload)time.

instanceDir Thetopcorefolder.ItcontainsaconfsubfolderthatcontainsSolrconfigurationfiles(schema.xml,solrconfig.xml,anddependentfiles).

dataDir Thefoldercontainingtheindexdatafiles.

lastModified Thelastmodificationdateoftheindex.

version AversionnumberassignedtotheIndexReaderinstanceassociatedwiththeindex.

numDocs Thenumberofsearchabledocumentsintheindex.Inotherwords,thisisthenumberofdocumentsyoucangetbackfroma*:*query.

maxDocsThenumberofinternaldocumentidentifiersactuallyinuse.ThedifferencebetweenmaxDocsandnumDocsindicateshowmanydocumentshavebeendeletedorreplaced.Theold(deletedandreplaced)identifiersaregraduallyremovedduringmergesorafterissuinganindexoptimize.

deletedDocsThenumberofdeleteddocuments.ItalsoincludesreplaceddocumentsbecauseSolrdoesn’tactuallysupportupdates;itsimplydeletesagivendocumentandsubsequentlyaddsitsnewversion.ThisisbasicallythedifferencebetweenmaxDocsandnumDocsafteracommitandbeforemergingoroptimizing.

optimized Indicateswhethertheindexhasbeenoptimized.

current Indicateswhethertheindexhasbeencommitted.

directory TheunderlyingLuceneDirectoryimplementation.

www.it-ebooks.info

www.it-ebooks.info

JavapropertiesandthreaddumpJavapropertiesformaread-onlysectionwhereyoucanseethesystempropertiesassociatedwiththecurrentJVMinstance.

TipRememberthatyoucanusethosevariablesinsolrconfig.xml,soyoumaywanttocheckinthispagewhetheraspecificpropertyhastheexpectedvalue.

ThethreaddumppageshowsasnapshotofwhatlivethreadsintheJVMaredoingatagiveninstant.Thesameinformationcanberetrievedusingthejstackcommand-lineutilityavailableinJVM.

TipThreaddumpsareveryusefulfordebugginghigh-CPU-usagescenariosanddeadlocks.

Unlikeloganalysis,theuserinterfacehereisdefinitelymoreuser-friendlythanmanualinspectionofthejstackoutput.

www.it-ebooks.info

www.it-ebooks.info

CoreoverviewSelectingoneoftheavailablecoresinthedrop-downlistontheleftsideoftheAdministrationConsolewillopenacorededicatedarea,withseveralothersections.Thefirstsectionisanoverviewoftheselectedcore.Itreportsmoreorlessthesameinformationthatwesawinthedashboardandinthecoreadminpage.

Here,thereisadditionalinformationaboutthehealthcheck(heartbeatinformationenabledonlyifyouconfiguredthepingrequesthandler)andthereplicationstatus.

Thereplicationsectionshowstheindexstatusofthemasterandslave(onlyifthecurrentSolrinstanceactsasaslave)intermsofreplicability.

TipThereplicationsectionisusefulformonitoringmaster-repeater-slaveinstances,especiallywhenyougetsomesynchronizationissueswithintheSolrensemble.NotethattheconsolealsohasadedicatedReplicationsectionwherethatinformationismoredetailed.

Themaster-slavereplicationarchitectureisexplainedinthenextchapter.

www.it-ebooks.info

CachesTospeedupqueryexecution,Solrstoresdatausingseveraltypesofin-memorycaches.Cachestransparentlystorefilters,documents,andidentifierssothatfuturerequestsforthesamedatacanbeservedfaster.Ifyourunthesamesearchtwice,youwillseeintheSolrlogsamarkeddifferencebetweenthefirstandthesecondqueryintermsofresponsetime,asshowninthefollowingexample:

…params={q=history&fq=catalog:NRA}hits=17298status=0QTime=78

…params={q=history&fq=catalog:NRA}hits=17298status=0QTime=2

Solrcomeswithseveralkindsofcaches.Theycanbeconfiguredandtunedinsolrconfig.xml:

<filterCacheclass="solr.FastLRUCache"size="512"initialSize="512"

autowarmCount="0"/>

<queryResultCacheclass="solr.LRUCache"size="512"initialSize="512"

autowarmCount="0"/>

<documentCacheclass="solr.LRUCache"size="512"initialSize="512"

autowarmCount="0"/>

<fieldValueCacheclass="solr.FastLRUCache"size="512"autowarmCount="128"

showItems="32"/>

ThefollowingtablebrieflydescribesthetypesofcachesavailableinSolr:

Cache Description

FilterCache Holdsthedocumentidentifiersassociatedwithfilterqueriesthathavebeenexecuted.

QueryResultCache Holdsthedocumentidentifiersresultingfromqueriesthathavebeenexecuted.

DocumentCache HoldsLucenedocumentinstancesforquickaccesstotheirstoredfields.

FieldCacheAlow-levelLucenefieldcachethatisnotmanagedbySolr(inotherwords,itcannotbeconfigured).Itisusedforsortingandfaceting.

FieldValueCacheThisisafieldcacheverysimilartoFieldCache,butitcanbeconfigured.Itismainlyusedforfaceting.

CustomCache Application-levelcachesusedtoholdcustomuser/applicationdata.

www.it-ebooks.info

www.it-ebooks.info

CachelifecyclesAcacheisalwaysassociatedwithanindexsearcherinstance,anditfollowsthesamelifecycleofthatinstance.Thismeansthat,whenanindexsearcherisinstantiated(onstartuporafteracommit),cacheinstancesarecreatedandassociatedwithit.Asaconsequenceofthis,cachesandcachedobjectsdon’thaveanexpirytime;theywillbevalidaslongastheowningindexsearcherinstanceisactive.

Whenasearcherisinstantiated,andifitisnotthefirstsearcher(thatis,atstartuptime),cachescanbeoptionallyauto-warmed;thatis,theycanbeprepopulatedwithsomedatacomingfromtheirpreviouscolleagues(cachesfromtheprevioussearcher).Theautowarmcountattributeallowsustodeclarethemaximumamountofdata(absoluteorapercentage)thatcanbeusedtoprepopulatethenewcache.

NoteDatafromthepreviouscacheisnottakenasitis.Ithastobevalidatedagainstthenewsearcher“view”oftheindex.Agivenobjectpreviouslycachedcan’tbevalidafterthenewsearcherhasbeenopened;itcouldhavebeendeleted.Theautowarmcountattributerefersonlytovalidentries.

Whenanewsearcherisopened,thecurrentsearcherwillcontinuetoservependingrequests.Afterthat,itwillbeclosedandtheorphancacheswillbesubjectedtogarbagecollection.

www.it-ebooks.info

CachesizingCachesizecanrefertotwodifferentmeasures:thetotalcountofobjectsacachecontainsataspecificmoment,andthemaximumnumberofobjectsacachecanhold.

Withinsolrconfig.xml,youcanconfiguretheminimum(initial)andmaximumsizeofacachebymeansoftheinitialSizeandsizeattributes,respectively:

<FilterCache…class="…"size="512"initialSize="512"/>

TheinitialSizeattributeisusedwhenthecacheinstanceiscreated.Itpreallocatesagivennumberofseatsforobjectsthatwillbecached.

Theidealdimensionofacachestrictlydependsontheapplication.Erroneously,onecouldthink:thebigger,thebetter,butthisisahalftruth;ahugecachewouldhavetheadvantageofholdingalltherequiredstructuresinmemory,thusallowingfastaccesstothatinformation.However,unlessyourindexiscompletelystaticanditneverchanges,youwillsoonerorlateradd,update,orremovesomething,andyouwillneedtocommitthosechanges.Acommitwillopenanewsearcher,whichinturnwillcreatenewcaches,andthe(old)hugecacheswillbediscarded.

Inthissituation,thegarbagecollectorwillhavealotofworktodoreclaimingallobjectsfromtheoldcaches.Worse,ifyouhaveconfiguredauto-warming,theprepopulationofthenewlycreatedcachescouldtakealotoftime.

Inotherwords,thisscenariorequiresalotofmemorytomanageallofthoseobjects.Frommyexperience,Icantellyouthatthisisoneofthecommonwaysofgetting“OutOfMemory”errormessages.Rememberthatgarbagecollectionisnotunderyourcontrol,somostprobablytherewillbeagivenintervaloftimeduringwhichtheJVMmustholdbothnewandoldobjectreferences.

Thesuggestionhereistostartwithdefaultsizes,andthenusetheSolrAdministrationConsoletoconstantlymonitorhowthingsmove.Cachemanagementisnotado-once-and-forgettask.Cachesmustbeperiodicallymonitoredandeventuallytunedinordertogainoptimaladvantageforyourapplication.

www.it-ebooks.info

CachedobjectlifecycleTheclassattributeofacachedeterminesprimarilyitsimplementation,butmostimportantly,itdefineshowobjectsaremanagedwithinthecache.Inotherwords,itimplementsthelogicneededtoknowwhattodowhenthecachereachesitsmaximumsizeandwhichobjectsmustbeevictedwhenanewentryarrives.

Solroffersthreecacheimplementations:

LRUCache:Oncethemaximumsizeofthecachehasbeenreachedandanewobjectneedstobecached,thisimplementationwillremovetheoldestentry.Theageofanobjectisdeterminedbythelasttimeitwasrequestedfromthecache.FastLRUCache:ThisimplementsbehaviorsimilartoLRUCachebutusesaseparatethreadto(asynchronously)cleanuptheoldestentries.LFUCache:Thispolicyimplementsanevictionbasedonthepopularityofeachobjectinthecache(thatis,howmanytimesagivenobjectinthecachehasbeenrequested).

www.it-ebooks.info

CachestatsForeachcache,theAdministrationConsolereports(Plugin/Stats|Cache)thefollowingattributes:

Attribute Description

lookups Thetotalcountoflookuprequests.

hits Thenumberofrequeststhatsuccessfullyfoundtherequestedobject.

hitratioThenumberofhitsontopofthetotalnumberofrequests.Avalueof1representsoptimalusageofthecache(everyrequestedobjecthasbeenfoundinthecache).

inserts Thetotalnumberofinsertedobjects.

evictions Thetotalnumberofevictions(objectsremoved).

size Thecurrentsizeofthecache.

warmupTime Thetimeneededtoauto-warmthecache.

cumulative_lookups

cumulative_hits

cumulative_hitratio

cumulative_inserts

cumulative_evictions

Acacheinstancedieswhentheassociatedsearcherisdiscarded.Thecumulativeattributesretainlookups,hits,hitratio,inserts,andevictionsamongallcacheinstances(ofthesametype),sothevalueofthoseattributesmeasuresthesamethingswejustsawbutcumulatively,sinceSolrstartup.

www.it-ebooks.info

TypesofcacheAswehavebrieflydescribed,Solrcomeswithseveralkindsofcaches.Thefollowingparagraphsdescribethemfurther.

FiltercacheEachtimeafilterqueryisexecuted,Solrplacesanewentryinafiltercache.Afiltercacheisakindofmapwherethekeyisrepresentedbythefilterquerystring(forexample,catalog:NRAorgenre:Jazz)andtheentryisalistofallmatchingdocumentidentifiers.

Thefiltercacheisconfiguredinthesolrconfig.xmlfile,inthefollowingfragment:

<filterCacheclass="solr.FastLRUCache"size="512"initialSize="512"

autowarmCount="0"/>

Filterqueriesplayacrucialroleinperformanceandresponsetimeoptimization.Thecachedidentifierscanbeusedandreusedwithsubsequentqueries;briefly,requeststhatcontaincachedfilterquerieswillimproveoverallperformancebecausethosequerieswon’tbeactuallyexecutedagain.

Auto-warmingafiltercachemeansrefreshingeverycachedfilterqueryresultbyexecuting(again)allofthosequeriesagainsttheindexviewrepresentedbythenewsearcher.Let’sseethiswithaconcreteexample;thesampleSolrinstancecontains24albums.Atstartuptime,thefiltercacheisempty.Nowlet’ssupposethefollowingqueriesareexecuted:

http://127.0.0.1:8983/solr/example/query?q=*:*&fq=genre:Jazz(3results)

http://127.0.0.1:8983/solr/example/query?q=*:*&fq=genre:Fusion(4results)

http://127.0.0.1:8983/solr/example/query?q=*:*&fq=released:1986(2results)

Thethreefilterqueriespopulatethefiltercacheasdescribedinthefollowingtable:

Cacheentries(filterqueries) Queryresults(Documentidentifiers)

genre:Jazz 1,2,3

genre:Fusion 4,5,6,7

released:1986 6,8

Nowwedecidetoremovedocument#6.Inordertodothis,wesendadeletecommandandthenacommitcommand.Oncethechangehasbeencommitted,document#6nolongerexists.Anewsearcherisopened,andthecachecontentneedstoberefreshedbecauseitstillcontainsaninvalidentry.So,theauto-warmingprocesssimplyrepeatseachfilterqueryinthecache(genre:Jazz,genre:Fusionandreleased:1986inthiscase)andrefreshesthecontentwithvalidqueryresults.Aftertheauto-warming,thefiltercachewillhavethefollowingcontent:

Cacheentries(filterqueries) Queryresults(Documentidentifiers)

www.it-ebooks.info

genre:Jazz 1,2,3

genre:Fusion 4,5,7

released:1986 8

Thisre-executionisingeneralthecostofauto-warming,whichisdirectlyconnectedwiththecachesize(ahugecacheinmostcaseswilltakesometimetore-executeallcachedqueries).

QueryResultcacheWiththiskindofcache,eachtimeaqueryisexecuted,itsresults(intermsofmatchingdocumentidentifiers)arecachedforfuturereuse.Thisisconfiguredinthefollowingfragmentofthesolrconfig.xmlfile:

<queryResultCacheclass="solr.FastLRUCache"size="512"initialSize="512"

autowarmCount="0"/>

Theunderlyingreasonisthatpopularqueries(thatis,queriesthatareoftenrepeated)willgainaclearadvantageherebecausetheywon’tbeactuallyexecutedagain—theirresultsarealreadycomputed.

NoteOtherthanpopularqueries,paginationmechanismsalsobenefitfromthiscache.Whentheuserasksforthenextorthepreviouspageofresultsforagivenqueryexecution,Solrwillrepeatthequerybutwithadifferentstartparameter.

DocumentcacheBothFilterCacheandQueryResultCachestoredocumentidentifiers.So,ontopofagivenquery,Solrcomputesthematchingidentifiers;foreachofthem,itneedstoquerytheindextoretrieveitsstoredfields.Afterthat,theresponseispopulatedwiththosedocumentsandtheircorresponding(stored)fields.

DocumentCachecachesLucenedocuments,soonceaqueryhasbeenexecuted,Solrdoesn’tneed(withregardtodocumentsthatarefoundinthiscache)toquerytheindextopopulatethelistofresults.

TipIfyouhavehugestoredfields(forexample,full-textfieldsusedforhighlighting),beawarethatyoucannotspecifywhichfieldsmustbeinthecache.Therefore,hugefieldsmayrequirealotofmemory.

FieldvaluecacheThefieldvaluecachehasamapstructurewherekeysarefieldnamesandvaluesareuninvertedfields.Thisstructuremapsdocumentidentifierswithvalues.Ifitisnotexplicitlydeclared,thiscacheisautomaticallygeneratedwithaninitialsizeof10,amaximumsizeof10000,andnoauto-warming.Itisprimarilyusedforfaceting.

www.it-ebooks.info

CustomcacheCustomcachesareintendedfordeveloperswhowritetheirownSolrextensions.Unliketheothertypes,customcachesacceptaregeneratorattribute,whichdeclaresaclassthatimplementstheauto-warminglogicforthecache.

www.it-ebooks.info

QueryhandlersThepageaccessedbynavigatingtoPlugin/Stats|QueryHandlershowsanexpandablelistwhereeachitemisaqueryhandlerconfiguredinsolrconfig.xml.Thislistincludeshandlersthatrepresentsearchendpoints(thatis,SearchHandler)butalsootherhandlerssuchas/admin/ping,/admin/dump,and/debug.

TheconfiguredUpdateRequestHandlerinstances(forexample,/updateand/update/json),beingsubclassesofRequestHandler,arealsolistedinthispage.

Foreachhandler,theconsoleshowssomebasicattributessuchastheclassname,version,ashortdescription,andasetofstatisticaldata,aslistedinthefollowingtable:

Attribute Description

handlerStart Thedate(inmilliseconds)whenthehandlerreceiveditsfirstrequest.

Requests Thetotalnumberofrequestsreceived.

Errors Thenumberofrequeststhatraisedanexceptionduringtheexecution.

timeoutsIfthequeryisexecutedwiththetimeAllowedparameterandthegiventimeoutexpires,Solrwillreturnonlypartialresults.Thisattributecountstherequeststhatfacethisscenario.

totalTime Thetotal(requests)executiontime.

avgRequestsPerSecond Theaveragenumberofrequestspersecond.

5minRateReqsPerSecond

15minRateReqsPerSecond

Theaveragenumberofrequestspersecondoverthelastfiveandfifteenminutes,respectively.

avgTimePerRequest Theaverage(request)executiontime.

75thPcRequestTime

95thPcRequestTime

99thPcRequestTime

999thPcRequestTime

Startingfromthedistributionofthetotalrequestexecutiontimes,theseattributesreportthevalueatthe75th,95th,99th,and999thpercentileinthatdistribution,respectively.

So,especiallyforsearchendpoints,thispageisveryusefultounderstandandmonitortheusageandthestatisticalbehaviorofyourSolrinstance.

www.it-ebooks.info

UpdatehandlersUnderthesamepath(Plugin|Stats),theUpdateHandlerisapagecontaininganentrycorrespondingtotheorg.apache.solr.update.DirectUpdateHandler2instance.

Thefollowingtablelistsanddescribestheattributesofthathandler:

Attribute Description

commits Thetotalnumberofcommitrequestsreceived.

autocommitmaxTimeThemaximumamountoftimethatisallowedtopasssinceadocumentwasaddedbeforeautomaticallytriggeringanewcommit.

autocommits Thetotalnumberofhardauto-commitsexecuted.

softautocommits Thetotalnumberofsoftauto-commitsexecuted.

optimizes Thetotalnumberofoptimizerequestsreceived.

rollbacks Thetotalnumberofrollbackrequestsreceived.

expungeDeletes ThetotalnumberofhardcommitswiththeexpungeDeletesflagsettotrue.

docsPending Thetotalnumberofupdatesthathavebeenprocessedbutnotcommitted.

adds Thetotalnumberofaddsrequestsreceived.

deletesById ThetotalnumberofdeleteByIdrequestsreceived.

deletesByQuery ThetotalnumberofdeleteByQueryrequestsreceived.

errors Thetotalnumberoffailedoperations(forexample,updates,commits,androllbacks).

cumulative_adds

cumulative_deletesById

cumulative_deletesByQuery

cumulative_errors

UpdateHandlerhasalifecycleassociatedwithowningSolrCore.Inotherwords,whenSolrCoreisreloaded,anewinstanceofUpdateHandleriscreated.Themonitoringattributesprefixedwithcumulativeareacumulativemeasureofaspecificattribute(forexample,additionsanddeletions)sincetheSolrstartup.

MostSolrinstallationsI’vedoneinlibrariesupdatetheindexonadailybasis.Eachmorning,theUpdateHandlerstatspageshowsaperfectsummaryofwhathappenedduringthepreviousdayandcumulativelysincethelaststartup.Clearly,intheeventoferrors,logfilesserveasmyfriends.

Ontheotherhand,ifIneedtomonitortheoverallprogressofanindexupdateinrealtime,thenIprefertheJMXway,whichisdescribedinthenextsection.

www.it-ebooks.info

www.it-ebooks.info

JMXJavaManagementExtensions(JMX)areapowerfulsetofAPIsusedtomonitorandmanagearunningJVM.ThebuildingblocksofJMXaretheso-calledManagementBeans(MBeans),whicharebasicallywrappersthatdecorateexistingobjectswithamanagementinterface.ThecoreclassesofJVMaredecoratedwithMBeans.

TipMoreinformationaboutJMXcanbefoundathttp://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html.

MBeansareregisteredwithanMBeanServerthatexposesthosemanagementinterfacestoexternalclients.Applicationsarefreetocreate,register,andexposethemanagementinterfaceoftheirownspecificservices.SolrMBeansarenotautomaticallyregisteredwiththeMBeanServer,butifyouwanttodothat,justwrite(oruncomment)thefollowinglineinsolrconfig.xml:

<jmx/>

TheJVMcomeswithtwobuilt-inJMXclientscalledJConsoleandJVisualVM.

TipJVisualVMandJConsoleareverysimilartools.Here,wewilltalkonlyabouttheJConsolebecauseJVisualVMdoesn’thavetheMBeansperspective.

OpenashellinyourPCandtypethefollowingcommand:

#$JAVA_HOME/bin/jconsole

Adialogpop-upwillappear.ThisisthefirstscreenofJConsole,whichisaJavastandaloneapplication.ThedialogcontainsalistoflocallyrunningJVMs.OneofthemshouldbetheonewhereSolrisrunning.Selectthatentry,andyoushouldseeascreenwithseveraltabs:Overview,Memory,Threads,Classes,VMSummary,andMBeans.Atthemoment,weareinterestedinthelasttab,MBeans.Hereyoucansee(thetreecomponentontheleftside)allregisteredMBeans,asdepictedinthefollowingscreenshot:

www.it-ebooks.info

ForeachMBeaninthetree,youcanseeitsmanagementinterfaceintherightpane.Amanagementinterfaceiscomposedofattributesandoperations.

Operationscanbeinvokedandattributescanbemonitoredbylookingattheirvalueatagivenmomentorforagiveninterval.Todothis,youhavetodouble-clickonthemandactivateareal-timechart.

ThemaindifferencesbetweentheSolrAdministrationConsoleandJConsoleareasfollows:

TheSolrAdministrationConsole,beingawebapplication,offersstaticsnapshotsofthesystem.WithJConsole,it’spossibletoactivatereal-timemonitoringofoneormoreattributes.ThisisnotlimitedtoMBeanattributes.Intheothertabs,youcanmonitorthreads,processors,memory,andgarbagecollection.JConsolehasafinerlevelofgranularitythantheAdministrationConsole.There,wecanseeallattributesandoperationsexposedformanagement.JConsole,beingmoretechnical,islessusablethantheAdministrationConsole.

Clearly,JConsole,JVisualVM,andtheSolrAdministrationConsolearenotalternatives.

www.it-ebooks.info

Theyshouldbeusedtogetherinordertogetadifferentperspectiveonthesystem.

www.it-ebooks.info

www.it-ebooks.info

SummaryInthischapter,wedescribedsomeconceptsaboutSolradministrationandmonitoring.WeintroducedafewsystemadministrationtoolssuchastheSolrAdministrationConsoleandJConsole,andwecoveredhardwareresources.

Rememberthat,althoughthetopicscoveredinthischaptershouldberelevantforanadministratornowadays,thisroleisspreadamongseveralpeople(especiallyinsmallandmediumcompanies)whoaremostlydevelopers(adeveloperinasmallormediumcompanyisalikea“factotum”).Thisisthereasonitisimportantfornon-administratorstohaveataleastbasicunderstandingofadministration,management,andmonitoring.

Inthenextchapter,youwillseehowSolrcanbedeployedinthecontextofdevelopment,testing,andproduction.Wewillillustrateanddescribeseveraldeploymentscenarios,startingfromthesimplest,standaloneinstance,continuingwithagraduallygrowinglevelofcomplexity,andendingwithSolrCloud.SolrCloudisahighlyavailable,fault-tolerantclusterofSolrserversthatprovidedistributedindexingandsearchcapabilities.

www.it-ebooks.info

www.it-ebooks.info

Chapter6.DeploymentScenariosThischaptercontainsinformationonthevariouswaysinwhichyoucandeploySolr,includingkeyfeaturesandprosandconsforeachscenario.

Solrhasawiderangeofdeploymentalternatives,frommonolithictodistributedindexesandstandalonetoclusteredinstances.Wewillorganizethischapterbydeploymentscenarios,withagrowinglevelofcomplexity.

Thischapterwillcoverthefollowingtopics:

ShardingReplication:master,slave,andrepeatersSolrCloud

www.it-ebooks.info

StandaloneinstanceAlltheexampleswefoundinthepreviouschaptersuseastandaloneinstanceofSolr,thatis,oneormorecoresmanagedbyaSolrdeploymenthostedinastandaloneservletcontainer(forexample,Jetty,Tomcat,andsoon).

Thiskindofdeploymentisusefulfordevelopmentbecause,asyoulearned,itisveryeasytostartanddebug.Besides,itcanalsobesuitableforaproductioncontextifyoudon’thavestrictnon-functionalrequirementsandhaveasmallormediumamountofdata.

TipIhaveusedastandaloneinstancetoprovideautocompleteservicesforsmallandmediumintranetsystems.

Anyway,themainfeaturesofthiskindofdeploymentaresimplicityandmaintainability;onesimplenodeactsasbothanindexerandasearcher.Thefollowingdiagramdepictsastandaloneinstancewithtwocores:

www.it-ebooks.info

www.it-ebooks.info

ShardsWhenamonolithicindexbecomestoolargeforasinglenodeorwhenadditions,deletions,orqueriestaketoolongtoexecute,theindexcanbesplitintomultiplepiecescalledshards.

NoteTheprevioussentencehighlightsalogicalandtheoreticalevolutionpathofaSolrindex.However,this(ingeneral)isvalidforallscenarioswewilldescribe.Itisstronglyrecommendedthatyouperformapreliminaryanalysisofyourdataandtheestimatedgrowthfactorinordertodecidefromthebeginningtherightconfigurationthatsuitsyourrequirements.Althoughitispossibletosplitanexistingindexintoshards(https://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/index/PKIndexSplitter.htmlthingsdefinitelybecomeeasierifyoustartdirectlywithadistributedindex(ifyouneedit,ofcourse).

Theindexissplitverticallysothateachshardcontainsadisjointsetoftheentireindex.Solrwillqueryandmergeresultsacrossthoseshards.ThefollowingdiagramillustratesaSolrdeploymentwith3nodes;thisdeploymentconsistsoftwocores(C1andC2)dividedintothreeshards(S1,S2,andS3):

Whenusingshards,onlyqueryrequestsaredistributed.Thismeansthatit’suptotheindexertoaddanddistributethedataacrossnodes,andtosubsequentlyforwardachange

www.it-ebooks.info

request(thatis,delete,replace,andcommit)foragivendocumenttotheappropriateshard(theshardthatownsthedocument).

TipTheSolrWikirecommendsasimple,hash-basedalgorithmtodeterminetheshardwhereagivendocumentshouldbeindexed:

documentId.hashCode()%numServers

Usingthisapproachisalsousefulinordertoknowinadvancewheretosenddeleteorupdaterequestsforagivendocument.

Ontheoppositeside,asearcherclientwillsendaqueryrequesttoanynode,butithastospecifyanadditionalshardsparameterthatdeclaresthetargetshardsthatwillbequeried.Inthefollowingexample,assumingthattwoshardsarehostedintwoserverslisteningtoports8080and8081,thesamerequestwhensenttobothnodeswillproducethesameresult:

http://localhost:8080/solr/c1/query?

q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2

http://localhost:8081/solr/c2/query?

q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2

Whensendingaqueryrequest,aclientcanoptionallyincludeapseudofieldassociatedwiththe[shard]transformer.Inthiscase,asapartofeachreturneddocument,therewillbeadditionalinformationindicatingtheowningshard.Thisisanexampleofsucharequest:

http://localhost:8080/solr/c1/query?

q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2&src_shard:

[shard]

Hereisthecorrespondingresponse(notethepseudofieldaliasedassrc_shard):

<resultname="response"numFound="192"start="0">

<doc>

<strname="id">9920</str>

<strname="brand">Fender</str>

<strname="model">JazzBass</str>

<arrname="artist">

<str>MarcusMiller</str>

</arr><strname="series">MarcusMillersignature</str>

<strname="src_shard">localhost:8080/solr/shard1</str>

</doc>

<doc>

<strname="id">4392</str>

<strname="brand">MusicMan</str>

<strname="model">StingRay</str>

<arrname="artist"><str>TonyLevin</str></arr>

<strname="series">5stringsDeLuxe</str>

<strname="src_shard">localhost:8081/solr/shard2</str>

</doc>

www.it-ebooks.info

</result>

Thefollowingareafewthingstokeepinmindwhenusingthisdeploymentscenario:

TheschemamusthaveauniqueKeyfield.Thisfieldmustbedeclaredasstoredandindexed;inaddition,itissupposedtobeuniqueacrossallshards.InverseDocumentFrequency(IDF)calculationscannotbedistributed.IDFiscomputedpershard.Joinsbetweendocumentsbelongingtodifferentshardsarenotsupported.Ifashardreceivesbothindexandqueryrequests,theindexmaychangeduringaqueryexecution,thuscompromisingtheoutgoingresults(forexample,amatchingdocumentthathasbeendeleted).

www.it-ebooks.info

www.it-ebooks.info

Master/slavesscenarioInamaster/slavesscenario,therearetwotypesofSolrservers:anindexer(themaster)andoneormoresearchers(theslaves).

Themasteristheserverthatmanagestheindex.Itreceivesupdaterequestsandappliesthosechanges.Asearcher,ontheotherhand,isaSolrserverthatexposessearchservicestoexternalclients.

Theindex,intermsofdatafiles,isreplicatedfromtheindexertothesearcherthroughHTTPbymeansofabuilt-inRequestHandlerthatmustbeconfiguredonboththeindexersideandsearcherside(withinthesolrconfig.xmlconfigurationfile).

Ontheindexer(master),areplicationconfigurationlookslikethis:

<requestHandler

name="/replication"

class="solr.ReplicationHandler">

<lstname="master">

<strname="replicateAfter">startup</str>

<strname="replicateAfter">optimize</str>

<strname="confFiles">schema.xml,stopwords.txt</str>

</lst>

</requestHandler>

Thereplicationmechanismcanbeconfiguredtobetriggeredafteroneofthefollowingevents:

Commit:AcommithasbeenappliedOptimize:TheindexhasbeenoptimizedStartup:TheSolrinstancehasstarted

Intheprecedingexample,wewanttheindextobereplicatedafterstartupandoptimizecommands.UsingtheconfFilesparameter,wecanalsoindicateasetofconfigurationfiles(schema.xmlandstopwords.txt,intheexample)thatmustbereplicatedtogetherwiththeindex.

NoteRememberthatchangesonthosefilesdon’ttriggeranyreplication.Onlyachangeintheindex,inconjunctionwithoneoftheeventswedefinedinthereplicateAfterparameter,willmarktheindex(andtheconfigurationfiles)asreplicable.

Onthesearcherside,theconfigurationlookslikethefollowing:

<requestHandler

name="/replication"

class="solr.ReplicationHandler">

<lstname="slave">

<strname="masterUrl">http://<localhost>:<port>/solrmaster</str>

<strname="pollInterval">00:00:10</str>

</lst>

</requestHandler>

www.it-ebooks.info

Youcanseethatasearcherperiodicallykeepspollingthemaster(thepollIntervalparameter)tocheckwhetheranewerversionoftheindexisavailable.Ifitis,thesearcherwillstartthereplicationmechanismbyissuingarequesttothemaster,whichiscompletelyunawareofthesearchers.

Thereplicabilitystatusoftheindexisactuallyindicatedbyaversionnumber.Ifthesearcherhasthesameversionasthemaster,itmeanstheindexisthesame.Iftheversionsaredifferent,itmeansthatanewerversionoftheindexisavailableonthemaster,andreplicationcanstart.

Otherthanseparatingresponsibilities,thisdeploymentconfigurationallowsustohaveaso-calleddiamondarchitecture,consistingofoneindexerandseveralsearchers.Whenthereplicationistriggered,eachsearcherintheringwillreceiveawholecopyoftheindex.Thisallowsthefollowing:

Loadbalancingoftheincoming(query)requests.Anincrementtotheavailabilityofthewholesystem.Intheeventofaservercrash,theothersearcherswillcontinuetoservetheincomingrequests.

Thefollowingdiagramillustratesamaster/slavedeploymentscenariowithoneindexer,threesearchers,andtwocores:

www.it-ebooks.info

Ifthesearchersareinseveralgeographicallydislocateddatacenters,anadditionalrolecalledrepeatercanbeconfiguredineachdatacenterinordertorationalizethereplicationdatatrafficflowbetweennodes.Arepeaterissimplyanodethatactsasbothamasterandaslave.Itisaslaveofthemainmaster,andatthesametime,itactsasmasterofthesearcherswithinthesamedatacenter,asshowninthisdiagram:

www.it-ebooks.info

www.it-ebooks.info

ShardswithreplicationThisscenariocombinesshardsandreplicationinordertohaveascalablesystemwithhighthroughputandavailability.Thereisoneindexerandoneormoresearchersforeachshard,allowingloadbalancingbetween(query)shardrequests.Thefollowingdiagramillustratesascenariowithtwocores,threeshards,oneindexer,and(duetoproblemswithavailablespace),onlyonesearcherforeachshard:

Thedrawbackofthisapproachisundoubtedlytheoverallgrowingcomplexityofthesystemthatrequiresmoreeffortintermsofmaintainability,manageability,andsystemadministration.Inadditiontothis,eachsearcherisanindependentnode,andwedon’thaveacentraladministrationconsolewhereasystemadministratorcangetaquickoverviewofsystemhealth.

ThesedisadvantageshavebeeneithermitigatedorovercomeinSolrCloud,whichisdescribedinthenextsection.

www.it-ebooks.info

www.it-ebooks.info

SolrCloudSolrCloudisahighlyavailable,fault-tolerantclusterofSolrserversthatprovidesdistributedindexingandsearchcapabilities.ThefollowingdiagramillustratesasimpleSolrCloudscenario:

AlthoughSolrCloudintroducedanewterminologytodefinethingsinadistributeddomain,theprecedingdiagramhasbeendrawnwiththesameconceptsthatwesawinthepreviousscenarios,forbetterunderstanding.

TipStartingfromSolr4.10.0,thedownloadbundlecontainsaninteractive,wizard-likecommand-linesetupforasampleSolrCloudinstallation.Astep-by-stepguideforthisisavailableathttps://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud.

ThefollowingsectionswilldescribetherelevantaspectsofSolrCloud.

www.it-ebooks.info

ClustermanagementApacheZookeeperwasintroducedinSolrCloudforclustercoordinationandconfiguration.Thismeansitisacentralactorinthisscenario,providingdiscovery,configuration,andlookupservicesforothercomponents(includingclients)togatherinformationabouttheSolrcluster.

ApacheZookeeper,beingacentralcomponent,canbeorganizedinaclusteritself(asdepictedinthepreviousdiagram)inordertoavoidasinglepointoffailure.AclusterofZookeepernodesiscalledensemble.

TipFormoreinformationaboutApacheZookeeper,visithttp://zookeeper.apache.org,theprojecthomepage.

www.it-ebooks.info

Replicationfactor,leaders,andreplicasIntheprecedingdiagram,wehaveonlyonecore(C1)withthreeshards(S1,S2,andS3).Now,themaindifferencebetweenthepreviousdistributedscenario(wherewemetshards)andthisscenarioisthathere,there’sacopyofeachshardineverynode.Thatcopyiscalledareplica.Inthisexample,wehavethreecopiesforeachshard,butthisisjustforsimplicity;youcanhaveasmanycopiesasyouwant.

Morespecifically,SolrCloudhasapropertycalledreplicationfactor,thatdeterminesthetotalnumberofcopiesintheclusterforeachshard.Amongthecopies,oneiselectedastheleader(theletter“L”onC1/S1onthefirstnode)whiletheremainingarereplicas(theletter“R”).

TipIntheprecedingdiagram,thereplicationfactoris3anditisequaltothenumberofnodes.Keepinmindthatthisisacoincidence;thosemeasurescouldbedifferent,andtheyactuallydependonyourclusterconfigurationandneeds.

Thisreplicationfeaturesatisfiesthreeimportantnonfunctionalrequirements:loadbalancing,highavailability,andbackup.Wehavealreadydescribedhowtheclassicreplicationmechanismprovidesloadbalancing.Havingthesamedatawithinmorethanonenodeallowsasearchertoissuequeryrequeststothosenodesinaround-robinfashion,thusexpandingtheoverallcapacityofthesystemintermsofqueriespersecond.Here,thecontextisthesame;eachshard,regardlessofwhetheritisaleaderorareplica,canbefoundonnnodes(wherenisthereplicationfactor);therefore,aclientcanusethosenodesforloadbalancingrequests.

Highavailabilityisadirectconsequenceoftheredundancyintroducedwithshardreplication.Thepresenceofthesamedata(andthesamesearchservices)onseveralnodesmeansthat,evenifoneofthosenodecrashes,aclientcancontinuetosendrequeststotheremainingnodes.

Theredundancyintroducedwiththereplicationalsoworksasabackupmechanism.Havingthesamethingsinseveralplacesprovidesabetterguaranteeagainstdataloss.Afterall,thisistheunderlyingprincipleofthepopularclouddataservices(forexample,Dropbox,ICloud,andCopy).

www.it-ebooks.info

DurabilityandrecoveryEachnodemaintainsawrite-aheadtransactionlog,whereanychangeisrecordedbeforebeingappliedtotheindex.Therefore,thetransactionlogisavailableforleadersandreplicas,anditwillbeusedtodeterminewhichcontentneedstobepartofachosenreplicaduringsynchronization.Forinstance,whenanewreplicaiscreated,itreferstoitsleaderanditstransactionlogtoknowwhichcontenttoget.

Thetransactionlogwillalsobeusedwhenrestartingaserverthatdidn’tshutdowngracefully.Itscontentwillbe“replayed”inordertosynchronizelocalleadersandreplicas.

TipWrite-aheadloggingiswidelyusedindistributedsystems.Formoreinformationaboutit,seehttps://cwiki.apache.org/confluence/display/solr/NRT%2C+Replication%2C+and+Disaster+Recovery+with+SolrCloud

Thetransactionlogpathcanbeconfiguredinanappropriatesectionofthesolconfig.xmlfile.

www.it-ebooks.info

ThenewterminologyNowthatthemainfeaturesofSolrCloudhavebeenexplained,wecanstopthinkingaboutitasanevolutionoftheshardscenarioandcoveritsownterminology:

Parameter Description

Node ThisisaJavaVirtualMachinerunningSolr.

Cluster AsetofSolrnodesthatformasingleunitofservice.

Shard Wepreviouslydefinedashardasaverticalsubsetoftheindex,thatis,asubsetofalldocumentsintheindex.Ashardisasinglecopyofthatsubset.InSolrCloud,itcanbealeaderorareplica.

Partition/slice Asubsetofthewholeindexreplicatedononeormorenodes.Asliceisbasicallycomposedofallshards(leaderandreplicas)belongingtothesamesubset.

Leader Eachshardhasonenodeidentifiedasitsleader.Thisroleiscrucialfortheupdateworkflow.Alltheupdatesbelongingtoapartitionroutethroughtheleader.

ReplicaThereplicationfactordeterminesthetotalnumberofcopieseachshardhas.Amongallofthosecopies,oneiselectedastheleader,whiletheothersarecalledreplicas.Whilequeryingcanbedoneacrossallshards,updatesarealwaysdirected(orforwardedbyreplicas)toleaders.

Replicationfactor Thenumberofcopiesofashard(andhence,ofadocument)maintainedbythecluster.

Collection Acorethatislogicallyandphysicallydistributedacrossthecluster.Inourexample,wehaveonlyonecollection(C1).

www.it-ebooks.info

AdministrationconsoleInaSolrClouddeployment,theadministrationconsoleofeachnodewillreportanadditionalmenuitemcalledCloud,whereit’spossibletogetanoverallviewofthecluster.Youcanchoosebetweenseveralgraphicrepresentationsofthecluster(tree,graph,andradial),butallofthemhaveacommonaim—givinganimmediateoverviewoftheclusterintermsofnodes,shards,andcollections.ThisisascreenshotfromtheadministrationconsoleoftheSolrCloudusedinthissection:

www.it-ebooks.info

CollectionsAPITheCollectionsAPIisusedtomanagethecluster,includingcollections,shards,andmetadataaboutthecluster.ThisinterfaceiscomposedofasingleHTTPserviceendpointlocatedathttp://<hostname>:<port>/<contextroot>/admin/collections.

TheCollectionsAPIacceptsanactionparameter,whichisamnemoniccodeassociatedwiththecommandthatwewanttoexecute.Eachcommandhasitsownsetofparametersthatdependonthegoalofthecommand.Thefollowingtableliststheallowedvaluesfortheactionparameter(thatis,theavailablecommands):

Action Description

CREATE Createsanewcollection.

RELOAD Reloadsacollection.ThisisusedwhenaconfigurationhasbeenchangedinZooKeeper.

DELETE Deletesacollection.

LIST Returnsthenamesofthecollectionsinthecluster.

CREATESHARD Createsanewshard.

SPLITSHARD Splitsanexistingshardintotwonewshards.

DELETESHARD Deletesaninactiveshard.

CREATEALIAS Createsorreplacesanaliasforanexistingcollection.

DELETEALIAS Deletesanalias.

ADDREPLICA Addsanewreplicaforagivenshard.

DELETEREPLICA Deletesareplicaofashard.

CLUSTERPROP Adds,edits,ordeletesaclusterproperty.

MIGRATE Movesdocumentsbetweencollections.

ADDROLEAddsaroletoanode.Atthetimeofwritingthisbook,theonlysupportedroleisanoverseer.Thisistheclusterleaderresponsibleforshardassignmentsandnodemanagementoperations.

REMOVEROLE Removesarolefromanode.

OVERSEERSTATUS Returnsthecurrentstatusoftheoverseer,includingsomestatsaboutservicescalls(forexample,createcollectionandcreateshard).

CLUSTERSTATUS Returnstheclusterstatus,includingshards,collections,replicas,aliases,andclusterproperties.

REQUESTSTATUS Returnsthestatusofthoserequeststhathavebeenexecutedasynchronously(for

www.it-ebooks.info

example,MIGRATE,SPLITSHARD,andCREATECOLLECTION).

ADDREPLICAPROP Addsorreplacesareplicaproperty.

DELETEREPLICAPROP Deletesareplicaproperty.

BALANCESHARDUNIQUE Distributesagivenpropertyevenlyamongthephysicalnodesthatmakeupacollection.

Thecompletelistofparametersforeachcommandisavailableathttps://cwiki.apache.org/confluence/display/solr/Collections+API.

www.it-ebooks.info

DistributedsearchQueriescanbesenttoanynodeperformingafulldistributedsearchacrosstheclusterwithloadbalancingandfailover.SolrCloudalsoallowspartialqueries,thatis,queriesexecutedagainstagroupofshards,alistofservers,oralistofcollections.

TipIfyouareusingJavaonclienttheside,CloudSolrServerinSolrjcompletelysimplifiescommunicationbetweentheclient,Zookeeper,andthecluster.Asadeveloper,youwillworkwiththeusualSolrServerinterface.

www.it-ebooks.info

Cluster-awareindexAdrawbackofthefirstdistributedscenariowemet(thatis,shards)wasthataclientthatwantstoissueanupdaterequestneedstoexplicitlypointtothetargetshard.ThisisnolongervalidinaSolrCloudcontextbecause,foragivenshard,therecouldbemorethanonecopy(thatis,aleaderandzeroormorecopies).Sotheupdatepathbecomesthefollowing:

UpdatescanbesenttoanynodeintheclusterIfthetargetnodeistheleaderoftheshardowningthedocument,theupdateisexecutedthere,andthenitisforwardedtoallreplicasIfthetargetnodeisareplica,thentheupdaterequestisforwardedtoitsleader,andtheflowdescribedinthepreviouspointapplies

TipTheCloudSolrServerinSolrjasksZookeeperabouttheleader’slocationbeforesendingupdates.Thus,requestsarealwaystargetedatleaders,avoidingadditionalnetworkround-trips.

www.it-ebooks.info

www.it-ebooks.info

SummaryInthischapter,wedescribedvariouswaysinwhichyoucandeploySolr.Eachdeploymentscenariohasspecificfeatures,advantages,anddrawbacksthatmakeachoiceidealforonecontextandbadforanother.Agoodthingisthatthedifferentscenariosarenotstrictlyexclusive;theyfollowanincrementalapproach.Inanidealcontext,thingsshouldstartimmediatelywiththeperfectscenariothatfitsyourneeds.However,unlessyourrequirementsareclearrightfromthestart,youcanbeginwithasimpleconfigurationandthenchangeit,dependingonhowyourapplicationevolves.

Inthenextchapter,wewillwalkthroughsomeusefuladd-onsthatarenotpartofthecoredistributionbutareincludedintheSolrdownloadbundle.

www.it-ebooks.info

www.it-ebooks.info

Chapter7.SolrExtensionsEverypopularopensourceprojectusuallyincludesacontribfoldercontainingseveralextramodulestosolvecommonusecaseimplementationproblems.InSolr,youcanfindsuchmoduleswithinthedownloadbundle,asdepictedinthefollowingscreenshot:

Supposeyourdataisinarelationaldatabase,anXMLfilewithacustomformat,oramailserver;youneedtoindexdatacomingfromaContentManagementSystem(suchasDrupal,Joomla!,orWordPress);oryouhaverichdocuments(suchasPDFsorMicrosoftOfficedocuments)andyouwanttodosomekindofautomatickeywordextraction.Ingeneral,theserequirementsarenotcoveredbythecorepartofSolr.Youwillhavetopluginandconfigurethosecontributionmodules.

Theaimofthischapteristodescribesuchmodules.Inordertodothat,wewillmakeuseofapreloadedsampleSolrinstance,withthoseextensions.Tostartthisinstance,youhavetocheckoutthesourceprojectassociatedwiththechapter,changethedirectorytothech7folder,andtypethisfromthecommandline:

#mvncleanpackagecargo:run

IfyoucheckedouttheprojectusingEclipse,youmighthavenoticedthat,underthesrc/dev/eclipsefolder,thereispreconfiguredlauncher.Right-clickonitandchoosetheDebugas…menuitem.

Regardlessofthewayyouchoose,youwillseesomethinglikethisattheend:

[INFO]Jetty8.1.15.v20140411Embeddedstartedonport[8983]

[INFO]PressCtrl-Ctostopthecontainer…

Thismeansthatthesampleinstanceisupandrunning.Thischapterwillcoverthe

www.it-ebooks.info

followingpoints:

ImportingdatafromseveraldatasourcesTextandmetadataextractionfromdigitaldocumentsLanguageidentificationSolritas(thatis,SolrandVelocity)Othercontribmodules

www.it-ebooks.info

DataImportHandlerTheDataImportHandlerisamodulethatenablesSolrtoloaddatafromseveraltypesofdatasources.Themostfrequenttypeofstoragewhereapplicationsputtheirdataisundoubtedlyarelationaldatabase,butingeneral,wecouldhavealotofscenarioshere:filesystems,websites,emails,FTPservers,LDAP,NoSQLdatabases,andsoon.

TheDataImportHandlermodule,otherthanprovidingalotofready-to-useconnectors,isanextensibleframeworkwheredevelopersarefreetoinjecttheirstorage-specificconnectorlogic.Theconfigurationhappensintwodifferentplaces:thefirstisthesolrconfig.xmlfile(asusual),wherethehandlerisdeclaredasfollows:

<requestHandlername="/import"

class="org.apache.solr.handler.dataimport.DataImportHandler">

<lstname="defaults">

<strname="config">dih-config.xml</str>

</lst>

</requestHandler>

Thesecondisthehandlerconfigurationfile(intheprecedingexample,wecalleditdih-config.xml).Althoughthespecificcontentofthatfilecouldvary,mainlydependingonthekindofdatasourceweareusing,thebuildingblocksofaDataImportHandlerdomainaredatasources,documents,entities,fields,transformers,andprocessors.

www.it-ebooks.info

DatasourcesAdatasourceisacollectionofrecordsthatstoredata.Althoughyouareprobablythinkingofrelationaldatabases,datasourcescanalsobeassociatedwithotherkindsofsourcesandprotocols,suchaswebsites(HTTP),FTPservers,LDAP,mailservers,andsoon.

AdatasourcedeclarationisprobablythefirstthingyouwillmeetinaDataImportHandlerconfigurationfile.Firstofall,youmustdeclarewhereyourdatais:

<dataSource

type="JdbcDataSource"

driver="com.mysql.jdbc.Driver"url="jdbc:mysql://host/database-name"

user="database_username"

password="database_password"/>

<dataSource

type="FileDataSource"encoding="UTF-8"/>

Notethatit’spossibletodeclaremorethanonedatasource(forexample,adatabaseandafilesystemortwodifferentdatabases).Eachdatasourcehasitsownspecificpropertiesthatdependonitsnature.Thefollowingtabledescribestheavailabledatasources:

Name Description

JdbcDataSource

Thisconnectstoadatabase(adirectconnectionorJNDIdatasource)usingaJDBCdriver.NotethatSolrdoesn’tcomewithanyJDBCdrivershipped.Youmustobtainitseparatelyandputthatlibraryundertheserverclasspathorunderthecorelibfolder.

URLDataSource ReadscharacterfilesusingHTTP.

BinURLDataSource ReadsbinaryfilesusingHTTP.

FileDataSource Readsfromlocalcharacterfiles.

BinFileDataSource Readsfromlocalbinaryfiles.

ContentStreamDataSource ReadsfromtheContentStreamofaPOSTrequestusingjava.io.Reader.

BinContentStreamDataSource ReadsfromtheContentStreamofaPOSTrequestusingjava.io.InputStream.

FieldReaderDataSource Usedinconjunctionwithotherdatasources,whenagivenfieldcontainstextthatneedsfurtherprocessing(forexample,whenitcontainsanXMLdocument).

FieldStreamDataSourceUsedinconjunctionwithotherdatasourceswhenagivenfieldcontainsbinarycontentthatneedsfurtherprocessing(forexample,whenitcontainsthevalueofaBLOBdatabasecolumn).

www.it-ebooks.info

Documents,entities,andfieldsMappingbetweenexternaldataandSolrisdoneusingdocuments,entities,andfields.

Adocumentrepresentsalogicaltype(suchasproducts,books,andassociations).Itcontainsoneormoreentities.

Entitiesarecalledrootorsubentitiesdependingontheirnestinglevel.Root-entitiesaredirectchildrenofadocument.Sub-entitiesarechildrenofanotherentity.Theyhavearelationshipwiththeirparents;withintheirconfiguration,it’spossibletouseanexpressionlanguagetorefertotheirparents.

FieldsareconcreteplaceswherethemappingbetweentheexternaldatasourceandSolrdocumentoccurs.Thefollowingfigureschematizestheserelationships:

Asingledocumentcanhaveoneormorerootentities.Eachentitydefinesthelogictogatheritsdataandpopulateitsfields.

Inthefollowingexample,aSolrschemacontainsbooks.Eachbookconsistsofanidentifier(id),atitle(title),andoneormoreauthors.Therearetwodatabasetables,BOOKSandAUTHORS,witha1:nrelationship(thismeansthatabookcanhavemorethanoneauthor).

First,let’sseehowtherootentity(thebook)isdefined:

<documentname="books">

<entityname="book"dataSource="my-ds"

query="SELECTBOOK_ID,TITLEFROMBOOKS"onError="skip">

<fieldcolumn="BOOK_ID"name="id"/>

<fieldcolumn="TITLE"name="title"/>

Asyoucansee,theentityisassociatedwithadatasourcecalledmy-ds.Itisconfiguredwithaquery,andforeachrecordoftheoutcomingResultSet,weareinterestedintwofields:BOOK_IDandTITLE.TheyaremappedwiththeidandtitlefieldsintheSolrschema.

TipIfthenameofthecolumn(orthealias)inResultSetcoincideswiththenameoftheSolrfield(caseinsensitive),the<field>declarationcanbeomitted.Solrwillperformthe

www.it-ebooks.info

mappingautomatically.So,intheprecedingexample,theTITLEmappingcanberemoved.

Now,sincethecardinalityoftherelationshipbetweenbooksandauthorsis1:n,weneedtodefineasub-entity.Foreachbook,wemustquerythedatasourceagaintofindthecorrespondingauthors:

<entityname="book"dataSource="my-ds"query="SELECTBOOK_ID,TITLEFROM

BOOKS"onError="skip">

<fieldcolumn="BOOK_ID"name="id"/>

<fieldcolumn="TITLE"name="title"/>

<entityname="author"dataSource="my-ds"query="SELECTNAMEFROMAUTHORS

WHEREBOOK_ID=${book.BOOK_ID}">

<fieldcolumn="NAME"name="author"/>

Theauthorsub-entitydeclaresaqueryontheAUTHORStable.Itusesasimpleexpressionlanguagetorefertotheidentifierofthecurrent(parent)book:

${<parententityname>.<databasealiasorcolumnname>}

Obviously,thisisareallysimplifiedexample.Inarealproductionscenario,youwillprobablymeetcomplicatedrelationalschemas,buttheDataImportHandlerlogicwillbealwaysthesame—detectandconfigureentitiesorfieldsinordertodenormalizeyourdatamodel.

www.it-ebooks.info

TransformersAtransformerisafunctionassociatedwithanentity(rootornested)thatcanmanipulatethefieldsfetchedbytheentityitself.Thetransformermustbedeclaredasanattributeofthetargetentity:

<entityname="author"transformer="script:createAuthorFullName">

Thecorrespondingfunctionwillbecalledforeachsetoffields(record)fetchedbythequeryassociatedwiththeentity.Thefunctionhascompletecontroloverthefetchedrecord.Itcanremove,add,orreplacefields.

Inthepreviousexample,theSolrschemaincludesanauthorfieldthatissupposedtoholdthecompletenameoftheauthor(forexample,DanteAlighieri).Nowlet’simaginethattheAUTHORStablecontainstwoseparatecolumnsinstead—FIRST_NAMEandLAST_NAME.Withthehelpofabuilt-inscripttransformer,wecanwriteasimpleJavaScriptfunctiontocombinethetwofields:

<script><![CDATA[

functioncreateAuthorFullName(record){

varfirst=record.remove('FIRST_NAME');

varlast=record.remove('LAST_NAME');

record.put('author',first+''+last);

returnrecord;

}

]]></script>

Notehowwemanipulatedthecurrentrecordbyaddinganewfield(author)andremovingtheLAST_NAMEandFIRST_NAMEfields.

Thefollowingtableliststheavailablebuilt-intransformers:

Name Description

ScriptTransformer ExecutesafunctionwritteninJavaScriptoranotherscriptinglanguagesupportedbyJava.

DateFormatTransformer Createsjava.util.Dateinstancesfromstringliterals.

HTMLStripTransformer StripsoffHTMLtagsfromfieldvalues.

LogTransformer Logsmessagesusingagiventemplate.

NumberFormatTransformer Createsnumberinstancesfromstringliterals.

RegexTransformer Usesregularexpressionstomanipulatedatainfields.

TemplateTransformer

Putsvaluesinacolumnbyresolvinganexpressioncontainingothercolumns.Forexample,theconcatenationwegotwiththeScriptTransformercanalsobedoneusingthistransformer:

<fieldname="author"template="${author.FIRSTNAME}${author.LAST_NAME}"

www.it-ebooks.info

Atransformerissimplyaclassthatextendsorg.apache.solr.handler.dataimport.Transformerso,ifthebuilt-inportfoliodoesn’tmeetyourneeds,itisalwayspossibletocreateacustomimplementation.

www.it-ebooks.info

EntityprocessorsEachentityishandledbyaso-calledEntityProcessorthatdefaultstoSQLEntityProcessor.Thisisbecausetherelationaldatabaseisthemostpopulartypeofdatasource.

However,whenusingadifferentdatasourcesuchasHTTP,filesorstreams,theentitymanagementlogicshouldhaveitsownspecificrequirementsthatmostprobablyfalloutsidetheareacoveredbySQLEntityProcessor.Inthesecases,youcanoverridethedefaultsettingsbyexplicitlydeclaringanEntityProcessorforagivenentity.

Asusual,therearealotofbuilt-inEntityProcessorinstancesbutitisalwayspossibletocreateacustomimplementationbyextendingtheorg.apache.solr.handler.dataimport.Entityprocessorclass.

Thefollowingtablelistsanddescribesavailableentityprocessors:

Name Description

SqlEntityProcessor Thisisthedefaultentityprocessorassignedtoeachentity.Itprovidessupporttoreadandcachedatafromdatabases.ItisusedinconjunctionwithJdbcDataSource.

FileListEntityProcessor Enumeratesthelistoffilesfromafilesystembasedoncriteriaspecifiedintheassociatedentity(forexample,basepath,recursive,andfilenamepattern).

LineEntityProcessor Readsfromadatasourceonaline-by-linebasisandproducesafieldcalledrawLineforeachlineread.

MailEntityProcessor HandlesemailsandattachmentsfromPOP3orIMAPsources.

PlainTextEntityProcessor ReadsfromadatasourceandreturnsafieldcalledplainText.Thisfieldcontainsastringrepresentingthesourcecontent.

SolrEntityProcessor ReadsvaluesfromanotherSolrinstanceusingSolrj.EachreturnedrecordisaSolrDocumentinstance.

TikaEntityProcessor ExtractsmetadataandtextfromrichdocumentsbymeansofApacheTika.Later,wewillseetheContentExtractionLibrary,whichalsousesTikaastheextractionengine.

XPathEntityProcessor UsesastreamingXPATHparsertoextractvaluesfromXMLdocuments.

www.it-ebooks.info

EventlistenersThedocumentelementintheDataImportHandlerconfigurationallowsustodeclaretwoeventlistenerstointerceptthemostrelevanteventsofadataimportlifecycle—onImportStartandonImportEnd:

<document

onImportStart="com.foo.MyImportStartEventListener"

onImportEnd="com.foo.MyImportEndEventListener">

Theeventlistenersmustimplementtheorg.apache.solr.handler.dataimport.EventListenerinterface,whichgivesthemaccess(bymeansofanorg.apache.solr.handler.dataimport.Contextinstance)tomostDataImportHandlerobjectsandeventstatisticssuchasdocumentsskipped,indexed,failed,andsoon.

www.it-ebooks.info

www.it-ebooks.info

ContentExtractionLibraryTheContentExtractionLibrary(alsoknownasSolrCell)integratesthepopularApacheTikaframeworktodetectandextractmetadataandtextfromalargevarietyoffiletypessuchasPDF,MicrosoftOffice,LibreOffice,andOpenOfficedocuments.

ApacheTikaprovidesafaçadeparserinterfaceontopofseverallow-levelframeworksthatareabletomanageandmanipulatespecificfiletypes(forexample,PDFBoxforPDFsandApachePOIforMicrosoftdocuments).Itssimpleinterfacealsoprovidesautomaticmime-typedetection,sotheframeworkitselfisabletounderstandthecorrectparserthatneedstobeappliedforagivenfile.

OntheSolrside,adedicatedExtractingRequestHandlerwillbeinchargeofgettingtheinputdata(files)sentbyclientsandextractingmetadataandtextbymeansofTika.

TheconfigurationofExtractingRequestHandlerfollowsthesameprocedurethatwesawfortheotherhandlers.Specifically,ithastobedeclaredinsolrconfig.xml,asfollows:

<requestHandlername="/update/extract"

class="solr.extraction.ExtractingRequestHandler">

<lstname="defaults">

</lst>

</requestHandler>

SolrCellhasseveraloptionsthatcanbeconfiguredtofine-tuneitsbehavior.Mostofthemarerelatedtometadatahandling,fieldnamemapping,andcustomTikaconfiguration.

TipForacompletelistofallconfigurationparameters,gotohttps://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Thesrc/solr/solr-home/example-datafolderintheexampleprojectcontainsadocumentthatcanbesenttoSolrCell.Openashellandtypethefollowing(replacethePROJECT_HOMEplaceholderwithyourch7projectlocalpath):

#curl"http://localhost:8983/solr/example/update/extract?commit=true"-F

data=@PROJECT_HOME/ch7/src/solr/solr-home/example-data/libreoffice-

writer.odt

Waitforamoment,andthenyoushouldseearesponselikethis:

<response>

<lstname="responseHeader">

<intname="status">0</int>

<intname="QTime">572</int>

</lst>

</response>

Thedocument(theLibreOfficedocumentinthiscase,butyoucanalsotryotherfiles)hasbeenindexed.Youcanseethat,whenyouopenthebrowserandtypehttp://127.0.0.1:8983/solr/example/select?q=stream_name:libreoffice-

www.it-ebooks.info

writer.odt&indent=true,theXMLresponseshowstheextractedtext(underthetextattribute)andallthemetadatafieldsthathavebeendetectedforthatdocument.

www.it-ebooks.info

www.it-ebooks.info

LanguageIdentifierThelanguageIdentifierextensiondetectsthelanguage(orlanguages)offieldsbelongingtoagivendocument.Thisisaveryusefuladd-ontouseinconjunctionwiththepreviouslydescribedextractionlibrary,togetadditionalinformationaboutdatathathasbeenindexed.

ThecomponentisimplementedasanUpdateRequestProcessorsubclassthatinterceptsandanalyzestheincomingdata:

<processor

class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcess

orFactory">

<strname="langid.fl">text</str>

<strname="langid.langField">language</str>

<strname="langid.fallback">en</str>

</processor>

Asyoucansee,thisprocessorcanbeconfiguredwithseveraloptions.Wecandeclarethefieldsoftheincomingdocumentsthatmustbeanalyzed,thenameofthefieldthatwillholdtheresultsoflanguagedetection,oradefaultfallbacklanguageincasenodetectionispossible.

TipIntheexampleprojectassociatedwiththischapter,youwillfindasolrconfig.xmlfilewherethechainisalreadydefinedbuttheUpdateRequestProcessoriscommentedout.Justremovethecommentmarkers,reloadthecoreusingtheAdministrationConsole,andreindexthedocumentsundertheexample-datafolder,followingthesameprocedureaswedescribedintheprevioussection.Attheend,youwillseeanadditional“language”fieldineachdocument;thatistheresultofthelanguagedetectioncomponent.

Youshouldknowthatdeclaringtheprocessorwithinthesolrconfig.xmlfileisnotenough.Weneedtoinsertthatintoanupdaterequestprocessorchain,andfinallyassociatethatchainwithanUpdateRequestHandler.Onlythoseupdaterequeststhatwillbereceivedbythathandlerwillpassthroughthelanguagedetectionanalysischain.

www.it-ebooks.info

www.it-ebooks.info

RapidprototypingwithSolaritasSolritasisthenameofacontributionmodulethatintegratesSolrwithApacheVelocity.ItisbasicallyaresponsewriterthatusestheApacheVelocitytemplateenginetorenderSolrresponseswithagraphicaluserinterface.

Asetofready-to-useVelocitytemplatesiscombinedwithSolrresponsesinordertoprovideasearchGUIwithalotoffeatures(forexample,faceting,highlighting,andautocompletion).

TipYoucanfindtheVelocitytemplatesunderthesrc/solr/solr-home/example/conf/velocityfolderofthech7project,orundertheexample/solr/collection1/conf/velocityfolderoftheSolrdownloadbundle.

AsthisGUIisdirectlyprovidedbytransformingtheemergingSolrresponses,there’snoneedforanexternalwebapplicationtoexecutesearchesandgraphicallyseethecorrespondingresults.

Okay,onecouldnowsay,“ThisisalreadypossiblewiththeSolrRESTservices”,butthatisdefinitelymoretechnicallycomplexandthesearchresultsaredisplayedinXMLorJSONorwhateverformat.Here,amoreuser-friendlyinterfaceisprovided,asshowninthefollowingscreenshot:

www.it-ebooks.info

ThatmakesSolritasanidealchoicetobuildrapidprototypes.ThesampleinstanceyoustartedatthebeginningofthischapterhasSolritasconfiguredinsolrconfig.xml.Itrespondstothe/solritasendpoint,soafterindexingsomedatafromthepreviousparagraph,openyourbrowserandtypehttp://127.0.0.1:8983/solr/example/solritas.

TipTheVelocitytemplateshavebeencopiedfromtheSolrdownloadbundle,sosomeareas(suchasGoogleMapswidgets,spatialqueries,andrangequeries)mightnotbevisibleormightnotmakesensewiththechapter’ssampledata.Ifyouwanttoseealloftheminaction,juststarttheSolrexampleinthedownloadbundleandnavigatetohttp://127.0.0.1:8983/solr/browseaddress.

YoushouldseeSolritas’resultspage,whichispreloadedwitha*:*querybydefault.

www.it-ebooks.info

www.it-ebooks.info

OtherextensionsThecontribfoldercontainsothermodulesorpluginsthatarebrieflydescribedinthefollowingsections.

www.it-ebooks.info

ClusteringTheclusteringmoduleisaframeworkusedtopluginthird-party(clustering)implementations.Atthetimeofwritingthisbook,itprovidessupportforclusteringsearchresultsusingtheCarrot2project.

TheSolrexamplethatcomeswiththedownloadbundlealreadycontainsaClusteringComponentwithinthesolrconfig.xmlconfigurationfile.Thedeclarationhappensintwophases.First,thecomponenthastobeconfigured:

<searchComponent

name="clustering"

enable="${solr.clustering.enabled:false}"

class="solr.clustering.ClusteringComponent">

<lstname="engine">

<strname="name">lingo</str>

<str

name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorit

hm</str>

<strname="carrot.resourcesDir">clustering/carrot2</str>

</lst>

</searchComponent>

Afterthis,aswithanyotherSearchComponent,youshouldenableitbyincludingitsnameintheRequestHandlerinstancewhereitissupposedtoplay:

<requestHandlername="/myRequestHandler"class="solr.SearchHandler">

<arrname="last-components">

<str>clustering</str>

</arr>

</requestHandler>

Inthisway,itcancontributetosearchresultsbyaddinga“clusters”section,likethis:

<response>

<result>

</result>

<arrname="clusters">

<arrname="labels">

<str>iPod</str>

</arr>

<doublename="score">1.3174612693376382</double>

<arrname="docs">

<str>F8V7067-APL-KIT</str>

<str>IW-02</str>

</arr>

<arrname="labels">

<str>HardDrive</str>

</arr>

</response>

www.it-ebooks.info

Ifyouwanttotrythisyourself,openashellandtypethefollowingcommands:

#cd$INSTALL_DIR/example

#java-Dsolr.clustering.enabled=true-jarstart.jar

ThesewillstartSolrwiththeClusteringComponentenabled.Now,onanothershelltypethis:

#cd$INSTALL_DIR/example/exampledocs

#./post.sh*.xml

Finally,openabrowserandexecutethisquery:http://localhost:8983/solr/clustering?q=*:*&rows=10

Youshouldgetaresponsesimilartotheprecedingexample,withthe“clusters”sectionatthebottom.

www.it-ebooks.info

UIMAMetadataExtractionLibraryThismoduleintegratesApacheUIMAinSolrbyprovidingapowerfulMetadataExtractionLibrarythatcanbeusedfortaskssuchasautomatickeywordextractionandNamedEntityRecognition(forexample,places,names,concepts,anddates).

TheplugincanbeprovidedbothasanUpdateRequestProcessorsubclass,todecoratetheindexprocesschain,orasasetofTokenizers/Filters,toaddsuchbehaviorinthe(indexorquery)textanalysisphase.

Usingthismodule,youcanenrichyourSolrdocumentswithadditionalmetadatainformationextractedfromtheinputdata.UIMAprovidesananalysisenginethatinvolvesseveralcomponentsarrangedinapipeline.ThedefaultpipelinesupportstheuseofexistinganalysisenginessuchasAlchemyorOpenCalais.Keepinmindthattheseenginesarenotfree-of-charge,buttheyprovideafreetrialperiod.YoucanregisterandobtainanAPIkeythatmustbeconfiguredinthesolrconfig.xmlfile.Othercomponentsareusedforlanguageandsentencedetection.

NoteUnderthecontrib/uimafolder,youwillfindaREADMEfilewithdetailedinformationabouttheSolrUIMAmoduleusage.

TheUIMAUpdateRequestProcessorinterceptsthedocumentsthatarebeingindexedandsendsthemtoitsanalysispipeline.Thosedocumentswillbeautomaticallyenrichedwithextractedinformationsuchassentences,languages,ornamedentities(forexample,placesornames).

www.it-ebooks.info

MapReduceTheMapReducecontribmoduleprovidesintegrationwithApacheHadoop.MapReduceisthenameofaparadigm(programmingmodel)thatisimplementedinApacheHadooptoprocesslargedatasetswithaparallelanddistributedalgorithm.

ThecontributioncontainsaMapReducejobtobuildSolrindexesandmergethemintoaSolrcluster.

www.it-ebooks.info

www.it-ebooks.info

SummaryInthischapter,weillustratedasetofcontributionmodulesthatarenotpartoftheSolrcorebutdefinitelyusefulinalotofrealscenarios.TheSolrdownloadbundlecontainsallofthem,andtheirinstallationisveryeasy.EachmodulefolderhasaREADMEfilethatguidesyouthroughinstallationandsetupsteps(basically,it’sjustamatterofcopying,pasting,andconfiguring).

Inthenextchapter,wewillconcludeourSolrpathwithanoverviewabouttheSolrcodebase.Youwilllearnhowtoworkwithitandeventuallyhowtocontributetotheopensourcecommunityprocess.

www.it-ebooks.info

www.it-ebooks.info

Chapter8.ContributingtoSolrAfriendofmineusedtosay,“Isthereabetterwaytostartanewyearthancontributingtoanopensourceproject?”Istronglyagree;agreatwaytogetinvolvedintheopensourceworldistocontributetotheprojectsyou’reusing.

Beingauserofanopensourcesoftware,youarealreadypartofthatworld—animportantpartthatmakesthatsoftwareuseful.Butthere’smore;youcandelvemoredeeplyintowhatactuallyhappensbehindthescenes.

Bytheendofthischapter,youwillhaveagoodunderstandingofthefollowingtopics:

TheconstituentpiecesoftheopensourceworldTheApachecontributionprocessHowtoworkwithSolrsourcecodeinyourIDE

www.it-ebooks.info

IdentifyingyourneedsWhyareyouinterestedintheopensourcecontributionprocess?WhydoyouwanttohavetheSolrsourcecodeinyourIDE?Thesearecrucialquestionsyoushouldanswerbeforedoingallthatisdescribedinthischapter.Inmyopinion,youcouldfallunderoneofthesescenarios:

Curiosity:Youwanttoinspectandseewithyoureyeshowthingsareworkingbehindthescenes.Bugfixing:YouwanttofixabugthatyoumetinyourSolrinstallation.Inthiswayyou,willsatisfyyourcustomerandthecommunitywillbenefitfromyourwork.Improvement:You’vegotanideaaboutaninterestingfeaturenotyetimplemented.Probably,acustomerrequirementledtothatidea,andyoubelievethatitcouldbeusefulforotherusersif(onceimplemented)itwouldbeintegratedinSolr.Wantingtocontribute:Yousimplywanttocontributebyfixinganexistingissueandparticipatinginthedevelopment/contributionprocess.

Whilecuriositycouldbeagoodreasontostartinvestigatingsourcecode,soonerorlater(andIwouldaddmostprobably),youwillfallintooneoftheothercategories.Atthattime,youwillnecessarilystartcommunicatingwithotherpeopleandthecommunitiesassociatedwiththeproject.

TipYoucanfindageneralintroductionabouttheApachecontributionprocessathttp://www.apache.org/foundation/getinvolved.html.

Thatinteractionwillinvolvesomegeneralaspectssuchasissuetracking,mailinglists,softwaredevelopment,andsoon.Onceyouhaveidentifiedyourneedsandgoals,youcanlookatupcomingsectionstogetadescriptionaboutthosecross-cuttingconcepts.

www.it-ebooks.info

Anexample–SOLR-3191In2013,IwasworkingonanOnlinePublicAccessCatalogue(OPAC)projectforabiglibrary.Theschemadefinitionbecamehugeverysoon,becausetheMARC,thestandardrepresentationforbibliographicrecords,isanoldandprovenstandardthatclassifieseachminimalpieceofinformationaboutacatalogitem.

Obviously,ourcustomerrequiredallthatrichnessinthesearchapplication,sowestartedwithasmallschemaandquicklyendedupwithalotoffields.

AnotherrequirementwasthecapabilitytodownloadeachiteminMARCXMLformat(MARCXMListheXMLrepresentationofaMARCrecord)intheenduserapplication.So,inordertosatisfythatrequirement,weputthewholeMARCrepresentationinadedicatedstoredfieldcalled,notsurprisingly,marc_xml.

Whatwastheproblem?OntheSolrside,wedefinedalotofSearchHandlerinstances,oneforeachkindofsearch(forexample,anykeyword,author,title,orsubject).Asyouknow,foreachhandlerwehavetodeclareall(stored)fieldsthatmustbeinthesearchresultsusingtheflparameter.

Inthefirstapproach,wesimplyputawildcard(*)asavaluefortheflparameter,asmostpartsofthosefieldswereneededintheuserinterface.Butafterithadbeenrunningforawhileinproduction,theITdepartment,inchargeofmonitoringthesystem,raisedanissueaboutthenetworktrafficbetweenthefrontendapplicationandtheSolrserver.Afterdoingsomeanalysis,wediscoveredalotofrecordswithahugemarc_xmlfieldreturnedtotheclient.“Ok,”saidoneoftheITguystous,“justexcludethemarc_xmlfieldfromtheflparameter”.

Theflparameteracceptsalistoffieldsthatmustbereturned,butthere’snowaytotellitwhatmustnotbeinthesearchresults.Eighthandlersweredefinedinthesolrconfig.xmlfile,andforeachofthem(later,wediscoveredtheXIncludefeature,butthat’sanotherstory),wehadtodeclareallstoredfields,excludingthemarc_xmlfield.Thiswasterribleandunmaintainable!

Aftergooglingabit,Ifoundseveralguysfacingthesameproblem,soIdecidedtotakealookatanexistingJIRAissue.Thus,Imetthe(unsolved)SOLR-3191issueathttps://issues.apache.org/jira/browse/SOLR-3191,whichdescribestheproblem:

SOLR-3191fieldexclusionfromfl

IthinkitwouldbeusefultoaddawaytoexcludefieldfromtheSolrresponse.IfIhaveforexample100storedfieldsandIwanttoreturnallofthembutone,itwouldbehandytolistjustthefieldIwanttoexcludeinsteadofthe99fieldsforinclusionthroughfl

SoIthoughttomyself:whydon’tyoutrytoimplementthatfeature?AndIdidwhatI’mgoingtodescribeinthischapter.Ifyoutakealookatthatissue,youwillseeIsubmittedtwopatchesandhadsomeexchangewithacoupleofSolrguys.

www.it-ebooks.info

www.it-ebooks.info

SubscribingtomailinglistsIfyouhaven’tsubscribedtoaSolrmailinglist(orlists)yet,youshoulddothatbeforegoingahead.Useranddeveloperlistsaretheprimaryplacewherethingssuchasdoubts,questions,features,andbugsarediscussed.

It’smainlytherethatyoushouldlooktosolveyourproblemandmeetpeoplewithsimilarrequirements.LikeanyotherApacheproject,Solrhasthefollowingmailinglists:

Auserlist–solr-user@lucene.apache.orgAdevlist–dev@lucene.apache.orgAcommitslist–commits@lucene.apache.org

EverySolrusershouldbesubscribedtotheuserlist.Thisusuallyavoidstheneedtoreinventthewheelbygettingideasandsolutionsfromusersanddevelopers.

ThedevlistismeantforlisteningorparticipationindiscussionsonLuceneandSolrinternals,developments,upcomingfeatures,andsoon.Thefocushereismoretechnical.

Finally,thecommitslistisusedtoreceivenotificationsabouteverySolrorLucenecommit.

Subscribingtoalistisveryeasy;justsendanemptyemailtosolr-user-subscribe@lucene.apache.org,dev-subscribe@lucene.apache.org,orcommits-subscribe@lucene.apache.org,andthenfollowtheprocedurewrittenintheansweringmail.

www.it-ebooks.info

www.it-ebooks.info

SigninguponJIRATheissuetrackerisanotherimportantbuildingblockoftheopensourcecontributionprocess.Wheneveranidea,question,bug,orfeaturebecomessomethingthatcouldaffectthecode,anewJIRAissueisfilled,andallthingsrelatedtothat(forexample,tasks,discussions,patches,code,andcommitlogs)willbeputthere.

IssuesinJIRAarepublic,soifyouwanttoonlyseeorreadthemthere’snoneedtohaveanaccount(youshouldhavealreadyreadtheSOLR-3191issueonJIRA,withouthavinganaccount).

However,ifyouwanttoparticipateinadiscussion,postapatch,orcreateorupdateissues,youmustsignupathttps://issues.apache.org/jira/secure/Signup!default.jspa.

Ultimately,youcansigninusingtheloginformathttps://issues.apache.org/jira/login.jsp.

That’sall!WelcometotheApacheIssueTracker!Notethat,beforeopeninganewissue,itisalwaysbettertopingthedevlistanddiscussit.Maybe,asimilarissuealreadyexistsandsomeoneisworkingonit.

www.it-ebooks.info

www.it-ebooks.info

SettingupthedevelopmentenvironmentFollowingthesamelogicthatwasusedinthepreviouschapters,IwillassumeyouhaveEclipseinstalled.Ifthatisnotthecase,thatis,ifyoufollowedtheexamplesusingsomeotherIDE(forexample,IntelliJ),afewstepscouldbeabitdifferent.

Inordertobeabletomodify,build,andrunSolrfromthesourcecode,youneedthefollowing:

AnIDEsuchasEclipseorIntelliJASubversionclient,whichcanbeastandaloneclient(suchasthesvncommand-linetoolorTortoiseSVN)oraplugininyourIDE(forexample,SubclipseorSubversive)ApacheANT(http://ant.apache.org/bindownload.cgi)

www.it-ebooks.info

VersioncontrolSubversionisanopensourceversioncontrolsystemthatisusedtomaintainthesourcecodeoftheApacheprojects,includingSolr.

Asafirststep,youneedtocheckouttheSolrsourcecodefromtheSVNrepository.Dependingonyourrole,youshouldpointtooneofthefollowingaddresses:

http://svn.apache.org/repos/asf/lucene/dev/<branch>

https://svn.apache.org/repos/asf/lucene/dev/<branch>

Asyoucansee,theonlydifferenceintheprecedinglinksisintheprotocol.Thefirstlink,whichuseshttp,isforanonymouscheckout,andtheother,whichuseshttps,isforcommitters.Committersarethosepeoplewhohavecommitrights,thatis,activemembersofthedevelopmentcommunitywithwritepermissionsontherepository.Iassumeyoudon’tfallwithinthislastcategory,sothecorrectlinkisthefirst.

Thelinkalsocontainsa<branch>placeholder.Thismustbereplacedwiththecorrecttargetversionyouwillworkon.Thatstrictlydependsonthetaskyouwouldliketodo.Ifyouwanttofixabuginapastversion(forexample,4.7.2),youshouldpointtothecorrespondingbranch.Ifyouwanttopickupanexistingenhancementorbugthathasbeenscheduledforthenextmajorrelease,youshouldpointtothe“trunk”leg.Thefollowingtabledescribeshowtherepositorytreeisorganized(http://svn.apache.org/repos/asf/lucene/dev/):

Folder Description

branches Developmentbranches.

branches/branch_5x Thedevelopmentbranchforthenextversion,5.x.

branches/Lucene_solr_3_6

branches/Lucene_solr_4_10

Thedevelopmentbranchesforversionsthathavebeenreleased.Apartfromsometasksthathavebeenscheduledforagivenrelease,mostofthedevelopmentactivitiesdoneinthesebranchesarebugfixes.

tags

Whenanewversionisreleased,thecorrespondingsourcecodeiscopiedhere,inadedicatedfolder(forexample,tags/lucene_solr_3_6_1andtags/lucene_solr_4_10_3).

trunk Thisisthemaincenterofdevelopment.

Thetargetbranchdependsonwhatyouwouldliketodo.IfyoupickupanexistingJIRAamongitsattributes,youwillalsofindtheaffectedversion.Besides,youmaywanttofixanissueinanolderversion(forexample,3.6.1)becauseyourcustomerisusingthatspecificversion.

Keepinmindthatmostdevelopmenttasksaredoneinthetrunkandthenreportedtothecorrespondingactivedevelopmentbranch(underthebranchesfolder).Anyway,beforestarting,itisalwaysrecommendedtopingthedevlistexplainingwhatyouwanttodo.

www.it-ebooks.info

CodestyleOneofthecommonproblemsinadistributeddevelopmentistheagreementaboutsourcecodeformalisms:comments,namingconventions,andsoon.

That’sthereasontheSolrdevelopmentteamprovidedtwousefulconfigurationfiles—oneforEclipseandanotherforIntelliJ.ThesefilescanbeimportedtothoseIDEstoautomatealotofthingssuchasindentation,bracespositions,linewrapping,comments,andsoon.

Pickupthatfilefromoneofthefollowingaddresses,dependingonyourfavoriteIDE:

Eclipse:http://people.apache.org/~rmuir/Eclipse-Lucene-Codestyle.xmlIntelliJ:http://people.apache.org/~erick/Intellij-Lucene-Codestyle.xml

InEclipse,theconfigurationfilecanbeimportedbygoingtoWindow|Preferences|Java|CodeStyle|FormatterandthenclickingontheImportbutton,asshowninthefollowingscreenshot:

Afterthat,navigatetoJava|Editor|SaveActions.SelectthePerformtheselectedactionsonsavecheckboxandtheFormateditedlinesradiobutton,asshowninthisscreenshot:

www.it-ebooks.info

www.it-ebooks.info

CheckingoutthecodeOnceyouhaveidentifiedthetargetbranchtoworkon,checkoutthesourcecodeusingthesvncommand-linetooloryourfavoritetool(forexample,TortoiseSVN).

SOLR-3191wasconsideredanewfeatureatthattime,soIcheckedoutthetrunk.ThecurrenttrunkrequiresJava8inordertobuildso,toexecutethestepsneededinthischapter,let’spointtoadifferentbranch(5_x).Openashellandtypethefollowingcommand:

#cd/work/solrdev

#svncheckout

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_5xsolr_5

Bearinmindthefollowing:

I’mnotacommitter,soIpointedtotheread-only(http)address.Thenameofthelocalfolderthatwillcontainthedownloadedsourceissolr_5.Ifitdoesn’texist,itwillbeautomaticallycreated.The/work/solrdev/solr_5folderisalocalworkingfolderonmymachine.Youcanchoosewhatevernameyoulike.

Whenyouexecutethatcommand,alotoffileswillbedownloaded.Intheend,youshouldseesomethinglikethis:

Asolr_5/solr/test-framework/src/java/overview.html

Asolr_5/.hgignore

Usolr_5

Checkedoutrevision1651057.

NowthesourcecodeofSolr5_xisinyourmachine.

www.it-ebooks.info

CreatingtheprojectinyourIDEGettingthesourcecodeisnotenough,unlessyouwanttodevelopyourpatchusingVim.YouwillhavetocreateaprojectinyourIDE.Assumingyouareinthe/work/solrdev/solr_5folderyoucreatedinthepreviousstep,typethefollowing:

#antcleantest

TheantcommandwillimmediatelyfailbecausethebuildrequiresIvy(adependencymanagementtool),andyoudon’thavethatonyourmachine.Noproblem!There’sadedicatedtaskthatcaninstallIvyforyou.Typethiscommand:

#antivy-bootstrap

Youshouldseesomethinglikethis:

ivy-bootstrap2:

ivy-checksum:

ivy-bootstrap:

BUILDSUCCESSFUL

Totaltime:3seconds

Nowwecanretrythefirstcommand:

#antcleantest

Thiswillexecutethewholetestsuite,whichisveryhuge,sotakealongcoffeebreak!

TipAlthoughthisstepisnotmandatory,itisstronglyrecommendedtocheckthestateofyourbuildbeforemakinganychange.Inthisway,youcanseewhetherthere’ssomethingfailing,somethingthatdoesn’thavetodowithyourchanges.

Oncethetestsuitehasbeenexecuted,typethiscommandifyouareusingEclipse:

#anteclipse

IfyouareusingIntelliJ,typethefollowingcommand:

#antidea

ThiswillgeneratetheIDEprojectfileswithinthecurrentdirectory(solr_5).Fromhereon,Iwillassumeyou’reusingEclipse,butthestepsarebasicallythesameforIntelliJ.

OpenEclipseandcreateanewworkspace(youcanalsousetheworkspacewhereyouloadedthesampleprojectsofthisbook).

OpentheFilemenuandchooseImport.Fromthedialogthatappears,gotoGeneral|ExistingProjectsintoWorkspace.UsingtheBrowsebutton,selectthe/work/solrdev/solr_5folder.PressOkandthenConfirm.Thedialogwillcloseandtheprojectwillbeimported,asshowninthisscreenshot:

www.it-ebooks.info

Oncetheprojecthasbeenbuilt,youshouldn’thaveanyerrors.Everythingisready,andyoucanproceedwithyourchange.

www.it-ebooks.info

www.it-ebooks.info

MakingyourchangesWewon’tdigverydeepinthisstepbecauseitbasicallydependsonthenatureofthetaskyoupickedup.Forinstance,mySOLR-3191patchcontainsfourexistingclassesthatIchangedtoimplementthatspecificbehavior.

Sincenobodyknowsyouandyourchangeswillbehopefullyintegratedinaverypopularframework,themostimportantthingstokeepinmindareasfollows:

Correctness:Theimplementationmustdowhatitissupposedtodo,accordingtotherequirementsexpressedintheJIRAissueDocumentation:Javadocatclassandmethodlevels(don’tincludethe@authortag)Unittests:Thesedescribeandvalidateyourchanges

ReturningtotheSOLR-3191example,Ichangedtwoclasses:

org.apache.solr.search.ReturnFields

org.apache.solr.search.SolrReturnFields

Theseclassescontainthelogicrequiredbytheissue.Atthesametime,IupdatedtwoTestCaseclasseswithseveralunittestsdemonstratingandvalidatingmychanges:

org.apache.solr.search.ReturnFieldsTest

org.apache.solr.search.TestPseudoReturnFields

Duringdevelopment,it’sbettertoperiodicallyexecutethetestsuite,inordertoensurethatyourchangesdidn’tintroduceanyside-effect.

TipWhenworkinginadistributeddevelopmentenvironment,itisstronglyrecommendedyourunansvnupdatecommandfrequently.Inthisway,youwillalwaysbeworkingwiththelatestversionofthebranchyoucheckedout.

Okay,takeyourtimeandmakeyourchanges.RemembertopostamessageintheissuepageinJIRAforeveryrelevantdoubt.Inthisway,allofthehistoryofyourworkwillbeinoneplace.

www.it-ebooks.info

www.it-ebooks.info

CreatingandsubmittingapatchOncetheimplementationhasbeencompleted,everythingisworking,andthetestsaregreen,it’stimetosubmitthepatch.

Beforedoingthat,openashellonthe/work/solrdev/solr_5workingfolderandtypethis:

#antprecommit

Thistaskwilllookforproblemsrelatedtotabindentation,authortags,andbrokenorwronglinksinjavadoc.Attheend,typethefollowingcommand:

#svnstat

Youwillseealistofsourcefilesthathavebeenchanged.Ifallofthemareassociatedwithyourchanges,justtypethiscommandinordertoincludetheminthepatch:

#svnstat|grep"^?"|awk'{print$2}'|xargssvnadd

Alternatively,youcanaddthosefilesonebyone,usingthefollowingcommand:

#svnadd<file>

Finally,typethiscommandtogenerateapatch:

#svndiff>/work/patches/SOLR-XXXX.patch

Thatwillcreateanewfile(SOLR-XXXX.patch)underthe/work/patcheslocalfolder.Hereareacoupleofthingstonote:

/work/patchesisasamplelocaldirectorythatI’vecreatedonmymachine.Youcanputthepatchinadifferentfolder.XXXXissupposedtobereplacedwiththenumberofthecorrespondingJIRAissue.Ifyouareupdatinganexistingpatch,thenameshouldalwaysfollowthisconventionbecauseJIRAwilltakecareofhighlightingthenewestversion.

TipIfyou’veinstalledanSVNpluginonyourIDE(suchasSubclipseorSubversiveinEclipse),youcandoeverythingwithoutusingthecommand-line.InSubclipse,forexample,there’saCreatePatchunderTeamthatwillguideyouthroughthenecessarystepswithaneasywizard.

Onceyou’vegotthepatchfile,openabrowser,logintoJIRA,gototheissuepage,anduploadthepatch.Itisrecommendedyoupostacommentwithinformation(includingadescription)aboutyoursubmission.That’sall!Nowyoushouldfollowyourissuebecauseseveralthingscanhappen:

Thepatchisperfect,soit’sjustamatteroftimeanditwillbeapplied.SomequestionscomefromJIRAusers.Inthatcase,youmaywanttoparticipateinadiscussionthatmighteventuallyrequestanewversionofthepatch.

www.it-ebooks.info

Anyway,thebigpartisdone!You’veactivelyparticipatedinthecontributionprocess,andhopefullyyourartifactwillbeintegratedwithSolr.Congrats!

www.it-ebooks.info

www.it-ebooks.info

OtherwaystocontributeBesideswritingcode,thereareotherwaystoparticipateinanopensourceproject.Afterall,thesoftwareisjustacomponentofafinalproduct.Wecanfindsupportanddocumentation,whichinmostcasesmaketherealdifferencebetweenagoodandabadproductfromtheuser’sperspective.

www.it-ebooks.info

DocumentationSoftwarequalityisdescribedbyacombinationofseveralfactors:functionalandnon-functionalfeatures,internalandexternalqualities,andlastbutnotleast,documentation.

By“documentation”,Ipersonallymeanacomplexandhugeworldmadeupofdifferenttypesofinformationfordifferenttypesoftargetaudience:

Technicalinternaldocumentation:Strictlyneededbyactivedeveloperstoinformaboutthestructureortheimplementationofthesystem.Technicalexternaldocumentation:Crucialforopensourceprojectsrepresentingframeworks,thingsthatcanbeextended.Thisissometimescalledthedeveloperguide.ThiskindofinformationdocumentsthepublicAPIandtheextensionpointsthatletdevelopersintegratetheproductwiththeirapplications.Userdocumentation:Thisenablesenduserstounderstandtheusageandpowerofagivensystem.Itissometimescalledauserguideandistheprimarysourceofinformationforanenduser.

Solrhastwomainplaceswheredocumentationcanbefound:

Thereferenceguide,availableonlineathttps://cwiki.apache.org/confluence/display/solr/About+This+Guide,orinPDFformatTheSolrcommunityWiki,athttps://wiki.apache.org/solr

Thefirstisaguideconstitutingtheofficialreferencedocumentation.ItiscreatedandmaintainedbySolrcommitters.Ontheotherhand,theWikiisapublicandcollaborativetool.AnyonecanpotentiallyedititscontentbycreatinganaccountandthenrequestingwritegrantsfromtheSolrteam.Fordetailedinstructionsrefertohttp://wiki.apache.org/solr/#How_to_edit_this_Wiki.

www.it-ebooks.info

MailinglistmoderatorAlistmoderatorisakindofsupervisorforagivenmailinglistandauserwithelevatedprivileges.Hecangetalistofallsubscribersandmanuallysubscribeorunsubscribeagivenuser.

Hechecksemailssenttothelistfromaddressesthatarenotsubscribedinordertoimprovespamfilterrules.Healsohelpsuserswhofaceissuesrelatedwithlists(forexample,subscriptionandun-subscription).

www.it-ebooks.info

www.it-ebooks.info

SummaryInthisfinalchapter,weillustratedtheoverallcontributionprocess.Beinganopensourceproject,theSolrteamwarmlywelcomesanykindofcontribution:sourcecode,bugfixing,documentation,andactiveparticipationinthemailinglists.There’snoneedtobeacommitter,whichwouldbesurelyanambitiousgoalforadeveloper.It’salwayspossibletodownloadthesourcecode,changeit,andeventually(ifyouthinkthechangescouldalsobeusefulforotherpeople)createapatchandsubmitittothecommunity.

www.it-ebooks.info

IndexA

addcommandabout/Addsending/Sendingaddcommands

addcommand,XMLformat<add>/AddcommitWithin/Addoverwrite/Add<doc>/Addboost/Add<field>/Add

Alchemyabout/UIMAMetadataExtractionLibrary

alternativequery/Alternativequeryanalyzersections/ThetextanalysisprocessApacheANT

URL/Settingupthedevelopmentenvironmentversioncontrol/Versioncontrol

ApachecontributionURL/Identifyingyourneeds

ApacheHadoopabout/MapReduce

ApachePOI/ContentExtractionLibraryApacheTikaframework/ContentExtractionLibraryApacheUIMA

about/UIMAMetadataExtractionLibraryApacheVelocity

about/RapidprototypingwithSolaritasApacheZookeeper

about/ClustermanagementURL/Clustermanagement

autocommitfeature/Updatehandlerandautocommitfeature

www.it-ebooks.info

Bbackgroundserver

Solr,runningas/DifferentwaystorunSolr,Backgroundserverbackup

about/Replicationfactor,leaders,andreplicasBooleanfields

about/BooleanBooleanparameters,servicebehavior

waitSearcher/Commit,optimize,androllbackwaitFlush/Commit,optimize,androllbacksoftCommit/Commit,optimize,androllback

Boostqueryparser/Otheravailableparsersbuilt-intransformers

ScriptTransformer/TransformersDateFormatTransformer/TransformersHTMLStripTransformer/TransformersLogTransformer/TransformersNumberFormatTransformer/TransformersRegexTransformer/TransformersTemplateTransformer/Transformers

www.it-ebooks.info

Ccache

about/CachesFilterCache/CachesQueryResultCache/CachesDocumentCache/CachesFieldCache/CachesFieldValueCache/CachesCustomCache/Cacheslifecycle/Cachelifecyclessizing/Cachesizingobjectslifecycle/CachedobjectlifecycleLRUCache/CachedobjectlifecycleFastLRUCache/CachedobjectlifecycleLFUCache/Cachedobjectlifecyclestats/Cachestatstypes/Typesofcache

cache,statslookups/Cachestatshits/Cachestatshitratio/Cachestatsinserts/Cachestatsevictions/Cachestatssize/CachestatswarmupTime/Cachestatscumulative_lookups/Cachestatscumulative_hits/Cachestatscumulative_hitratio/Cachestatscumulative_inserts/Cachestatscumulative_evictions/Cachestats

cache,typesfiltercache/Filtercachequeryresultcache/QueryResultcachedocumentcache/Documentcachefieldvaluecache/Fieldvaluecachecustomcache/Customcache

Carrot2projectabout/Clustering

changescreating/Makingyourchanges

charfilters/Charfiltersreferencelink/Charfilters

clusteringmodule

www.it-ebooks.info

about/ClusteringCollectionsAPI,actions

CREATE/CollectionsAPIRELOAD/CollectionsAPIDELETE/CollectionsAPILIST/CollectionsAPICREATESHARD/CollectionsAPISPLITSHARD/CollectionsAPIDELETESHARD/CollectionsAPICREATEALIAS/CollectionsAPIDELETEALIAS/CollectionsAPIADDREPLICA/CollectionsAPIDELETEREPLICA/CollectionsAPICLUSTERPROP/CollectionsAPIMIGRATE/CollectionsAPIADDROLE/CollectionsAPIREMOVEROLE/CollectionsAPIOVERSEERSTATUS/CollectionsAPICLUSTERSTATUS/CollectionsAPIREQUESTSTATUS/CollectionsAPIADDREPLICAPROP/CollectionsAPIDELETEREPLICAPROP/CollectionsAPIBALANCESHARDUNIQUE/CollectionsAPI

configurationparametersURL/ContentExtractionLibrary

ContentExtractionLibrary/ContentExtractionLibrarycopyfields/CopyfieldsCore

overview/CoreoverviewCoreAdmin

about/CoreAdmintoptoolbar/CoreAdmincentralarea/CoreAdmin

CoreAdmin,centralareastartTime/CoreAdmininstanceDir/CoreAdmindataDir/CoreAdminlastModified/CoreAdminversion/CoreAdminnumDocs/CoreAdminmaxDocs/CoreAdmindeletedDocs/CoreAdminoptimized/CoreAdmincurrent/CoreAdmin

www.it-ebooks.info

directory/CoreAdminCoreAdmin,toptoolbar

Unload/CoreAdminRename/CoreAdminSwap/CoreAdminReload/CoreAdminOptimize/CoreAdmin

customcache/Customcachecustomdata

indexing/Indexingcustomdatacustomresponsewriter

using/Usingacustomresponsewriter

www.it-ebooks.info

DDamerau-Levenshteindistancealgorithm/Fuzzydashboard

about/DashboardphysicalandJVMmemory/PhysicalandJVMmemorydisk/Diskusagefiledescriptors/Filedescriptors

databaserecordversusdocument/Thedocument

DataImportHandlermoduleabout/DataImportHandlerdatasources/Datasourcesentities/Documents,entities,andfieldsdocuments/Documents,entities,andfieldsfields/Documents,entities,andfieldstransformer/Transformersentityprocessors/Entityprocessorseventlisteners/Eventlisteners

datasourcesabout/DatasourcesJdbcDataSource/DatasourcesURLDataSource/DatasourcesBinURLDataSource/DatasourcesFileDataSource/DatasourcesBinFileDataSource/DatasourcesContentStreamDataSource/DatasourcesBinContentStreamDataSource/DatasourcesFieldReaderDataSource/DatasourcesFieldStreamDataSource/Datasources

dateformatabout/Date

defaultsimilarity/Defaultsimilaritydeletecommands

issuing/Deletedevelopmentenvironment

settingup/Settingupthedevelopmentenvironmentversioncontrol/Versioncontrolcodestyle/Codestylecode,checkingout/Checkingoutthecodeprojectcreating,inIDE/CreatingtheprojectinyourIDE

diamondarchitectureabout/Master/slavesscenario

Dis

www.it-ebooks.info

about/TheDisjunctionMaximumqueryparserMax/TheDisjunctionMaximumqueryparser

disjunctionmaxquery/Tiebreakerdisjunctionsumquery/TiebreakerDisMaxqueryparser

about/TheDisjunctionMaximumqueryparserqueryfields/QueryFieldsalternativequery/Alternativequeryminimumnumberofmatches/Minimumshouldmatchphrasefields/Phrasefieldsqueryphraseslop/Queryphraseslopphraseslop/Phraseslopboostqueries/Boostqueriesadditiveboostfunctions/Additiveboostfunctionstieparameter/Tiebreaker

Document/Inputandoutputdatatransferobjectsdocument

about/Thedocumentversusdatabaserecord/Thedocument

documentationabout/Documentationtechnicalinternaldocumentation/Documentationtechnicalexternaldocumentation/Documentationuserdocumentation/Documentation

documentcacheabout/Documentcache

documentsabout/Documents,entities,andfields

dynamicfields/Dynamicfields

www.it-ebooks.info

EEclipse

URL/CodestyleEclipseIDEforJavaDevelopers

URL/PrerequisiteseDisMaxqueryparser

about/TheExtendedDisjunctionMaximumqueryparserfieldedsearch/Fieldedsearchphrasebigramfield/Phrasebigramandtrigramfieldsphrasetrigramfield/Phrasebigramandtrigramfieldsphrasetrigramslop/Phrasebigramandtrigramslopphrasebigramslop/Phrasebigramandtrigramslopmultiplicativeboostfunction/Multiplicativeboostfunctionuserfields/Userfieldslowercaseoperators/Lowercaseoperators

ensembleabout/Clustermanagement

entitiesabout/Documents,entities,andfieldsrootentities/Documents,entities,andfieldssubentities/Documents,entities,andfields

EntityProcessorabout/Entityprocessors

entityprocessorsSqlEntityProcessor/EntityprocessorsFileListEntityProcessor/EntityprocessorsLineEntityProcessor/EntityprocessorsMailEntityProcessor/EntityprocessorsPlainTextEntityProcessor/EntityprocessorsSolrEntityProcessor/EntityprocessorsTikaEntityProcessor/EntityprocessorsXPathEntityProcessor/Entityprocessors

eventlistenersabout/Eventlisteners

extensionsabout/Otherextensionsclusteringmodule/ClusteringUIMAMetadataExtractionLibrary/UIMAMetadataExtractionLibraryMapReduce/MapReduce

www.it-ebooks.info

Ffacetcomponent

about/Facetfacetqueries/Facetqueriesfacetfields/Facetfieldsfacetranges/Facetrangespivotfacets/Pivotfacetsintervalfacets/Intervalfacets

facetedsearch/Facetfacetfields/Facetfields

facet.field/Facetfieldsfacet.prefix/Facetfieldsfacet.sort/Facetfieldsfacet.limit/Facetfieldsfacet.offset/Facetfieldsfacet.mincount/Facetfieldsfacet.missing/Facetfieldsfacet.method/Facetfieldsfacet.threads/Facetfields

facetqueries/Facetqueriesfacetranges

about/Facetrangesfacet.range/Facetrangesfacet.range.start/Facetrangesfacet.range.end/Facetrangesfacet.range.gap/Facetranges

facets/FacetFactoryclass/ChangingthestoredvalueoffieldsFastLRUCache/Cachedobjectlifecyclefastvectorhighlighter/Fastvectorhighlighterfieldedsearch/Fieldedsearchfieldlists/FieldlistsFieldqueryparser/Otheravailableparsersfields

about/Documents,entities,andfieldsfields,Solrschema

about/Fieldsstatic/Staticfieldsdynamic/Dynamicfieldscopy/Copyfields

fieldsattributes,Solrschemaname/Fieldstype/Fields

www.it-ebooks.info

indexed/Fieldsstored/Fieldsrequired/Fieldsdefault/FieldssortMissingFirst/FieldssortMissingLast/FieldsomitNorms/FieldsomitPositions/FieldsomitTermFreqAndPositions/FieldstermVectors/FieldsdocValues/Fields

fieldtypes,Solrschemaabout/Fieldtypestextanalysisprocess/Thetextanalysisprocesscharfilters/Charfilterstokenizer/Tokenizerstokenfilters/Tokenfiltersimplementing/Puttingitalltogetherreferencelink/Someexamplefieldtypes

fieldtypesattributes,Solrschemaname/Fieldtypestype/FieldtypessortMissingFirst/FieldtypessortMissingLast/Fieldtypesindexed/Fieldtypesstored/FieldtypesmultiValued/FieldtypesomitNorms/FieldtypesomitTermsAndFrequencyPositions/FieldtypesomitPositions/FieldtypespositionsIncrementGap/FieldtypesautogeneratePhraseQueries/Fieldtypescompressed/FieldtypescompressThreshold/Fieldtypes

fieldtypesexamples,Solrschemaabout/Someexamplefieldtypesstring/Stringnumeric/NumbersBooleanfields/Booleandate/Datetext/Textcurrency/Othertypesbinary/Othertypesgeospatialtypes/Othertypes

www.it-ebooks.info

random/Othertypesfieldvaluecache/Fieldvaluecachefiledescriptors/Filedescriptorsfiltercache

about/Filtercachefilterqueries/FilterqueriesFirstQueryITCaseintegrationtest/Integrationtestserverflparameter

about/FieldlistsFunctionqueryparser/Otheravailableparsersfuzzyquery/Fuzzy

www.it-ebooks.info

Hhardcommit/Updatehandlerandautocommitfeaturehighavailability

about/Replicationfactor,leaders,andreplicashighlightcomponent

about/Highlightingparameters/Highlightingstandardhighlighter/Standardhighlighterfastvectorhighlighter/Fastvectorhighlighterpostingshighlighter/Postingshighlighter

http/Versioncontrolhttps/Versioncontrol

www.it-ebooks.info

I<indexConfig>section,attributes

writeLockTimeout/IndexconfigurationmaxIndexingThreads/IndexconfigurationuseCompoundFile/IndexconfigurationramBufferSizeMB/IndexconfigurationramBufferSizeDocs/IndexconfigurationmergePolicy/IndexconfigurationmergeFactor/IndexconfigurationmergeScheduler/IndexconfigurationlockType/Indexconfiguration

IDEproject,creating/CreatingtheprojectinyourIDE

indexedfieldsabout/String

indexingconfigurationabout/Solrindexingconfiguration,Indexconfigurationgeneralsettings/Generalsettingsupdatehandler/Updatehandlerandautocommitfeatureautocommitfeature/UpdatehandlerandautocommitfeatureRequestHandler/RequestHandlerUpdateRequestProcessor/UpdateRequestProcessor

indexoperationsabout/Indexoperationsadd/Adddeletecommands,issuing/Deletecommit/Commit,optimize,androllbackoptimize/Commit,optimize,androllbackrollback/Commit,optimize,androllback

indexprocessextending/Extendingandcustomizingtheindexprocess

integrationtestserverSolr,runningas/DifferentwaystorunSolr,Integrationtestserver

IntelliJURL/Codestyle

intervalfacets/IntervalfacetsInverseDocumentFrequency(IDF)/Shardsinvertedindex

about/Theinvertedindex

www.it-ebooks.info

JJava

URL,fordownloading/PrerequisitesJavaDevelopmentKit7(JDK)/PrerequisitesJavaproperties

andthreaddump/JavapropertiesandthreaddumpJavaVirtualMachine(JVM)/PrerequisitesJConsole/JMXJIRA

signingup/SigninguponJIRAsigningup,URL/SigninguponJIRAloginform,URL/SigninguponJIRA

JMXabout/JMXURL/JMX

Joinqueryparser/OtheravailableparsersJVisualVM/JMXJVMmemory

andphysical/PhysicalandJVMmemoryJVMoptions

URL/PhysicalandJVMmemory

www.it-ebooks.info

Llanguageidentifier

about/LanguageIdentifierLFUCache/Cachedobjectlifecyclelistmoderator

about/Mailinglistmoderatorloadbalancing

about/Replicationfactor,leaders,andreplicaslogging

about/LoggingLRUCache/CachedobjectlifecycleLuceneindex/FiledescriptorsLucenequeryparser/Otheravailableparsers

www.it-ebooks.info

MM2Eclipse(M2E)/Prerequisitesmailinglists

subscribingto/SubscribingtomailinglistsManagementBeans(MBeans)/JMXMapReduce

about/MapReduceMARCXML/Anexample–SOLR-3191master/slavescenario

about/Master/slavesscenarioMavenCargoPlugin

URL/Understandingtheprojectstructuremorelikethissearchcomponent

about/Morelikethisparameters/Morelikethis

www.it-ebooks.info

N1*nrelationship/Documents,entities,andfieldsnumerictype

about/Numbers

www.it-ebooks.info

OOnlinePublicAccessCatalogue(OPAC)/Anexample–SOLR-3191OnlinePublicApplicationCatalogue(OPAC)/FieldsOpenCalais

about/UIMAMetadataExtractionLibraryoperators

AND/Terms,fields,andoperatorsOR/Terms,fields,andoperators+/Terms,fields,andoperators-/NOT/Terms,fields,andoperators

optimizeabout/Commit,optimize,androllback

www.it-ebooks.info

Ppatch

submitting/Creatingandsubmittingapatchcreating/Creatingandsubmittingapatch

PDFBox/ContentExtractionLibraryphrasefields/Phrasefieldspivotfacets/Pivotfacetspostingshighlighter/PostingshighlighterProcessorclass/Changingthestoredvalueoffieldsprojectstructure,Solrdevelopmentenvironment

about/Understandingtheprojectstructuresrc/main/java/Understandingtheprojectstructuresrc/main/resources/Understandingtheprojectstructuresrc/test/java/Understandingtheprojectstructuresrc/test/resources/Understandingtheprojectstructuresrc/dev/eclipse/Understandingtheprojectstructuresrc/solr-home/Understandingtheprojectstructurepom.xml/Understandingtheprojectstructure

www.it-ebooks.info

Qqueryanalyzers/Queryanalyzersqueryfields/QueryFieldsqueryhandlers

about/QueryhandlershandlerStartattribute/Queryhandlersrequestsattribute/Queryhandlerserrorsattribute/Queryhandlerstimeoutsattribute/QueryhandlerstotalTimeattribute/QueryhandlersavgRequestsPerSecondattribute/QueryhandlersavgTimePerRequestattribute/Queryhandlers

queryingabout/Queryingsearch-relatedconfiguration/Search-relatedconfigurationqueryanalyzers/Queryanalyzersqueryparameters/Commonqueryparameters

querylanguageabout/Querying

queryparametersabout/Commonqueryparameters,Queryparametersq/Commonqueryparametersstart/Commonqueryparametersrows/Commonqueryparameterssort/CommonqueryparametersdefType/Commonqueryparametersfl/Commonqueryparametersfq/Commonqueryparameterswt/CommonqueryparametersdebugQuery/CommonqueryparametersexplainOther/CommonqueryparameterstimeAllowed/Commonqueryparameterscache/CommonqueryparametersomitHeader/Commonqueryparametersfieldlists/Fieldlistsfilterqueries/Filterqueriesdefaults/Queryparametersappends/Queryparametersinvariants/Queryparameters

queryparserabout/QueryparsersSolrqueryparser/TheSolrqueryparserDisMaxqueryparser/TheDisjunctionMaximumqueryparser

www.it-ebooks.info

eDisMaxqueryparser/TheExtendedDisjunctionMaximumqueryparserqueryphraseslop/Queryphraseslopqueryresultcache

about/QueryResultcache

www.it-ebooks.info

Rrangesearches/Rangesrapidprototyping,Solaritas/RapidprototypingwithSolaritasRawqueryparser/OtheravailableparsersRealTimeGetHandler/RealTimeGetHandlerrepeater

about/Master/slavesscenarioreplica

about/Replicationfactor,leaders,andreplicasreplicationfactor

about/Replicationfactor,leaders,andreplicasreplicationmechanism

commit/Master/slavesscenariooptimize/Master/slavesscenariostartup/Master/slavesscenario

repositorytreeURL/Versioncontrol

RequestHandler/RequestHandlerresponseoutputwriters

about/Responseoutputwritersxml/Responseoutputwritersxslt/Responseoutputwritersjson/Responseoutputwriterscsv/Responseoutputwritersvelocity/Responseoutputwritersjavabin/Responseoutputwriterspython/Responseoutputwritersruby/Responseoutputwritersphp/Responseoutputwriters

rollback/Commit,optimize,androllbackroot-entities/Documents,entities,andfields

www.it-ebooks.info

Ssampleproject

about/Thesampleprojectschema.xmlfile/schema.xmlschemasections

about/Otherschemasectionsuniquekey/Uniquekeydefaultsimilarity/Defaultsimilarity

search-relatedconfigurationabout/Search-relatedconfigurationsettings/Search-relatedconfiguration

searchcomponentabout/Searchcomponentsquery/Queryfacet/Facethighlight/Highlightingmorelikethis/Morelikethisqueryelevation/Othercomponentsterms/Othercomponentsstats/Othercomponentsspellcheck/Othercomponentstermvector/Othercomponentsdebug/Othercomponents

searchcomponents/Searchcomponentssearchhandler

about/Searchhandlerstandardrequesthandler/StandardrequesthandlerRealTimeGetHandler/RealTimeGetHandler

shardsabout/ShardsURL/Shardsusing/Shardswithreplication/Shardswithreplication

size-estimator-lucene-solr.xlsURL/Prerequisites

softcommit/UpdatehandlerandautocommitfeatureSolidStateDisks(SSD)/DiskusageSolr

latestversion,downloading/DownloadingtherightversionURL,fordownloadbundle/Downloadingtherightversionserver,settingup/Settingupandrunningtheserverserver,running/Settingupandrunningtheserverrunning,asbackgroundserver/DifferentwaystorunSolr,Backgroundserver

www.it-ebooks.info

running,asintegrationtestserver/DifferentwaystorunSolr,Integrationtestserverabout/Whatdowehaveinstalled?,ExtendingSolrotherresources/Otherresourcesrealtimeandindexeddata,mixing/Mixingreal-timeandindexeddatacustomresponsewriter,using/Usingacustomresponsewriterdata,addingto/Addsanddeletesdata,deleting/Addsanddeletessearchingwith/Searchbindings/Otherbindingsrequirements,identifying/Identifyingyourneedsreferenceguide,URL/DocumentationURL/Documentation

Solr,clientsURL/Otherbindings

SOLR-3191about/Anexample–SOLR-3191URL/Anexample–SOLR-3191

solr-x.y.zdirectory/Settingupandrunningtheserversolr.xml/solr.xmlSolrCloud

about/SolrServer–theSolrfaçade,SolrCloudURL/SolrCloudclustermanagement/Clustermanagementreplicationfactor/Replicationfactor,leaders,andreplicasleaders/Replicationfactor,leaders,andreplicasreplicas/Replicationfactor,leaders,andreplicasdurability/Durabilityandrecoveryrecovery/Durabilityandrecoveryfeatures/Thenewterminologyadministrationconsole/AdministrationconsoleCollectionsAPI/CollectionsAPIdistributedsearch/Distributedsearchcluster-awareindex/Cluster-awareindex

SolrcommunityWikiURL/Documentation

solrconfig.xmlfile/solrconfig.xmlSolrcore

about/TheSolrcoreSolrdatamodel

about/UnderstandingtheSolrdatamodeldocument/Thedocumentinvertedindex/Theinvertedindex

Solrdevelopmentenvironment

www.it-ebooks.info

settingup/SettingupaSolrdevelopmentenvironmentprerequisites/Prerequisitessampleproject,importing/Importingthesampleprojectofthischapterprojectstructure/Understandingtheprojectstructure

Solrextension,GitHubURL/Commit,optimize,androllback

Solrhomeabout/Solrhome

Solrindex/FiledescriptorsSolritas

about/RapidprototypingwithSolaritasrapidprototyping/RapidprototypingwithSolaritas

Solrjabout/SolrjSolrServer/SolrServer–theSolrfaçadeinputdatatransferobject/Inputandoutputdatatransferobjectsoutputdatatransferobject/Inputandoutputdatatransferobjects

Solrqueryparserabout/TheSolrqueryparserterms/Terms,fields,andoperatorsfields/Terms,fields,andoperatorsoperators/Terms,fields,andoperatorsboosts/Boostswildcardcharacters/Wildcardsfuzzyquery/Fuzzyproximity/Proximityrangesearches/Ranges

Solrschemaabout/TheSolrschemafieldtypes/Fieldtypesfields/Fields

SolrServerabout/SolrServer–theSolrfaçadeEmbeddedSolrServer/SolrServer–theSolrfaçadeHttpSolrServer/SolrServer–theSolrfaçadeLBHttpSolrServer/SolrServer–theSolrfaçadeConcurrentUpdateSolrServer/SolrServer–theSolrfaçadeCloudSolrServer/SolrServer–theSolrfaçade

SolrsourcerepositoryURL/PhysicalandJVMmemory

sortfieldsabout/String

Spatialfilterqueryparser/OtheravailableparsersSQLEntityProcessor

www.it-ebooks.info

about/Entityprocessorsstandaloneinstance,ofSolr

about/StandaloneinstancestandaloneSolrinstance

installing/InstallingastandaloneSolrinstanceprerequisites/Prerequisites

standardhighlighter/Standardhighlighterstandardrequesthandler

about/Standardrequesthandlersearchcomponents/Searchcomponentsqueryparameters/Queryparameters

staticfields/Staticfieldsstoredvalue,offields

modifying/Changingthestoredvalueoffieldsstringtype

about/Stringindexedfields/Stringsortfields/String

sub-entities/Documents,entities,andfieldssubversion

about/VersioncontrolSurroundqueryparser/Otheravailableparsers

www.it-ebooks.info

Ttechnicalexternaldocumentation/Documentationtechnicalinternaldocumentation/Documentationterm

about/ThetextanalysisprocessTermqueryparser/Otheravailableparserstext

about/Texttextanalysisprocess

about/Thetextanalysisprocesspositionincrement/Thetextanalysisprocessstartandendoffset/Thetextanalysisprocesspayload/Thetextanalysisprocess

threaddumpandJavaproperties/Javapropertiesandthreaddump

thresholds,fortriggeringauto-commitsmaxDocs/UpdatehandlerandautocommitfeaturemaxTime/Updatehandlerandautocommitfeature

tieparameter/Tiebreakertokenfilters

about/Tokenfiltersreferencelink/Tokenfilters

tokenizerabout/Tokenizersreferencelink/Tokenizers

transformerabout/Transformers

transformersURL/Fieldlists

troubleshootingabout/Troubleshooting,TroubleshootingUnsupportedClassVersionErrorerror/UnsupportedClassVersionErrorfailedtoreadartifactdescriptor/The“Failedtoreadartifactdescriptor”messagemultivaluedfields/MultivaluedfieldsandthecopyFielddirectivecopyFielddirective/MultivaluedfieldsandthecopyFielddirective,RequiredfieldsandthecopyFielddirectivecopyFieldinputvalue/ThecopyFieldinputvaluerequiredfields/RequiredfieldsandthecopyFielddirectivestoredtext,immutable/Storedtextisimmutable!datanotindexed/Datanotindexed

troubleshooting,Solrabout/Troubleshooting,Noscoreisreturnedinresponse

www.it-ebooks.info

UUIMAMetadataExtractionLibrary/UIMAMetadataExtractionLibraryuniquekey/UniquekeyUnsupportedClassVersionErrorerror/UnsupportedClassVersionErrorupdatehandler/Updatehandlerandautocommitfeatureupdatehandlers

about/Updatehandlerscommitsattribute/UpdatehandlersautocommitmaxTimeattribute/Updatehandlersautocommitsattribute/Updatehandlerssoftautocommitsattribute/Updatehandlersoptimizesattribute/Updatehandlersrollbacksattribute/UpdatehandlersexpungeDeletesattribute/UpdatehandlersdocsPendingattribute/Updatehandlersaddsattribute/UpdatehandlersdeletesByIdattribute/UpdatehandlersdeletesByQueryattribute/Updatehandlerserrorsattribute/Updatehandlerscumulative_adds/Updatehandlerscumulative_deletesById/Updatehandlerscumulative_deletesByQuery/Updatehandlerscumulative_errors/Updatehandlers

UpdateRequestProcessor/UpdateRequestProcessoruserdocumentation/Documentationuserguide/Documentation

www.it-ebooks.info

Wwildcardcharacters/Wildcards

www.it-ebooks.info

top related